{"id":9875,"date":"2020-09-28T08:00:51","date_gmt":"2020-09-28T08:00:51","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=9875"},"modified":"2023-10-27T13:27:33","modified_gmt":"2023-10-27T13:27:33","slug":"using-regression-with-correlated-data","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/using-regression-with-correlated-data\/","title":{"rendered":"Using Regression with Correlated Data"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"9875\" class=\"elementor elementor-9875\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-232ebd elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"62768\" data-id=\"232ebd\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4aa54376\" data-eae-slider=\"35791\" data-id=\"4aa54376\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7f470bef elementor-widget elementor-widget-text-editor\" data-id=\"7f470bef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"has-medium-font-size wp-block-paragraph\">Tutorial (including R code) for using Generalized Estimating Equations and Multilevel Models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c2f6\">While regression models are easy to run given their short, simple syntax, this accessibility also makes it easy to use regression inappropriately. These models have several key assumptions that need to be met in order for their output to be valid, but your code will typically run whether or not these assumptions have been met.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-09323e4 elementor-widget elementor-widget-video\" data-id=\"09323e4\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;youtube_url&quot;:&quot;https:\\\/\\\/www.youtube.com\\\/watch?v=UQDWWorLZE8&quot;,&quot;video_type&quot;:&quot;youtube&quot;,&quot;controls&quot;:&quot;yes&quot;}\" data-widget_type=\"video.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-wrapper elementor-open-inline\">\n\t\t\t<div class=\"elementor-video\"><\/div>\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dfd9ab9 elementor-widget elementor-widget-text-editor\" data-id=\"dfd9ab9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"bd3e\">For linear regression (used with a continuous outcome), these assumptions are as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Independence: All observations are independent of each other, residuals are uncorrelated<\/strong><\/li>\n<li>Linearity: The relationship between X and Y is linear<\/li>\n<li>Homoscedasticity: Constant variance of residuals at different values of X<\/li>\n<li>Normality: Data should be normally distributed around the regression line<\/li>\n<\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-76fba34 elementor-widget elementor-widget-text-editor\" data-id=\"76fba34\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"f735\">For logistic regression (used with a binary or ordinal categorical outcome), these assumptions are as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Independence: All observations are independent of each other, residuals are uncorrelated<\/strong><\/li>\n<li>Linearity in the logit: The relationship between X and the logit of Y is linear<\/li>\n<li>Model is correctly specified, including lack of multicollinearity<\/li>\n<\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-38d1cf7 elementor-widget elementor-widget-text-editor\" data-id=\"38d1cf7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"a1d7\">In both kinds of simple regression models, independent observations are absolutely necessary to fit a valid model. If your data points are correlated, this assumption of independence is violated. Fortunately, there are still ways to produce a valid regression model with correlated data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2677864 elementor-widget elementor-widget-heading\" data-id=\"2677864\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Correlated Data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f4e195d elementor-widget elementor-widget-text-editor\" data-id=\"f4e195d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"2431\">Correlation in data occurs primarily through multiple measurements (e.g. two measurements are taken on each participant 1 week apart, and data points within individuals are not independent) or if there is clustering in the data (e.g. a survey is conducted among students attending different schools, and data points from students within a given school are not independent).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a0a3\">The result is that that the outcome has been measured on the level of an individual observation, but that there is a\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6071979\/\" rel=\"noopener\">second level<\/a>\u00a0of either an individual (in the case of multiple time points) or clusters on which individual data points can be correlated. Ignoring this correlation means that standard error cannot be accurately computed, and in most cases will be artificially low.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a13678e elementor-widget elementor-widget-text-editor\" data-id=\"a13678e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"a7ab\">The best way to know if your data is correlated is simply through familiarity with your data and the collection process that produced it. If you know that you have repeated measures from the same individuals or have data on participants who can be grouped into families or schools, you can assume that your data points are probably not independent. You can also\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6071979\/\" rel=\"noopener\">investigate your data for possible correlation<\/a>\u00a0by calculating the ICC (intraclass correlation coefficient) to determine how correlated data points are within possible groups, or by looking for correlation in your residuals.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f3b7a54 elementor-widget elementor-widget-heading\" data-id=\"f3b7a54\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Regression Modeling with Correlated Data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c28bd52 elementor-widget elementor-widget-text-editor\" data-id=\"c28bd52\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"5f81\">As previously mentioned, simple regression will produce inaccurate standard errors with correlated data and therefore should not be used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"e5cd\">Instead, you want to use models that can account for the correlation that is present in your data. If the correlation is due to some grouping variable (e.g. school) or repeated measures over time, then you can choose between Generalized Estimating Equations or Multilevel Models. These modeling techniques can handle either binary or continuous outcome variables, so can be used to replace either logistic or linear regression when the data are correlated.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-efcf61b elementor-widget elementor-widget-heading\" data-id=\"efcf61b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Generalized Estimating Equations<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fab067a elementor-widget elementor-widget-text-editor\" data-id=\"fab067a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"1a02\">Generalized estimating equations (GEE) will give you beta estimates that are the same or similar to those produced by simple regression, but with appropriate standard errors. Generalized estimating equations are particularly useful when you have repeated measures for the same individuals or units. This modeling technique tends to work well when you have many small clusters, which is often the result of having a few measurements on a large number of participants. GEE also allows the user to\u00a0specify one of numerous correlation structures, which can be a useful feature depending on your data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-28a4e87 elementor-widget elementor-widget-heading\" data-id=\"28a4e87\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Multilevel Modeling<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1eba6f9 elementor-widget elementor-widget-text-editor\" data-id=\"1eba6f9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"94da\"><a href=\"http:\/\/www.bristol.ac.uk\/cmm\/learning\/multilevel-models\/what-why.html\" rel=\"noopener\">Multilevel modeling<\/a>\u00a0(MLM) also provides appropriate standard errors when data points are not independent. It is typically the best modeling approach when the user is interested in relationships both within and between clustered groups, and is not simply looking to account for the effect of correlation in standard error estimates. MLM has the additional advantage of being able to handle more than two levels in the response variable. The primary drawback of MLM models is that they require larger sample sizes within each cluster, so may not work well when clusters are small.<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-292d463 elementor-widget elementor-widget-text-editor\" data-id=\"292d463\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"da07\">Both GEE and MLM are fairly easy to use in R. Below, I will walk through examples with the two most common kinds of correlated data: data with repeated measures from individuals and data collected from individuals with an important grouping variable (in this case, country). I will fit simple regression, GEE, and MLM models with each dataset, and will discuss which modeling technique is best for these different data types.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7687d08 elementor-widget elementor-widget-heading\" data-id=\"7687d08\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Example 1: Data from the Fragile Families &amp; Child Wellbeing Study<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-62e0a67 elementor-widget elementor-widget-text-editor\" data-id=\"62e0a67\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"8ddb\">The data that I will be working with first comes from Years 9 and 15 of the Princeton University\u00a0<a href=\"https:\/\/fragilefamilies.princeton.edu\/\" rel=\"noopener\">Fragile Families &amp; Child Wellbeing Study<\/a>, which follows the families of selected children born between 1998 and 2000 in major US cities. Data are publicly available, and can be accessed by submitting a brief request on the Fragile Families\u00a0<a href=\"https:\/\/opr.princeton.edu\/archive\/restricted\/Default.aspx\" rel=\"noopener\">Data and Documentation<\/a>\u00a0page. Since this study follows up with the same families year after year, data points from the same family units at different time points are not independent.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3994\">This dataset contains dozens of variables representing the health of wellbeing of participating children and their parents. Being in psychiatric epidemiology, I am primarily interested in examining the children\u2019s mental-wellbeing. Participating children are asked if they frequently feel sad, and I will be using answers to this \u201coften feeling sad\u201d question as my outcome. Since substance use is tied to poorer mental wellbeing among adolescents, I will be using variables representing alcohol and tobacco use as predictors*.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-54a1f98 elementor-widget elementor-widget-text-editor\" data-id=\"54a1f98\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"cecd\">*Note: Models created in this article are for demonstration purposes only and should not be considered to be meaningful. I have not considered confounding, mediation, other model assumptions, or other possible data issues in the construction of these models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d376\">First, let\u2019s load the packages that we\u2019ll be using. I\u2019ve loaded \u201ctidyverse\u201d to clean our data, \u201chaven\u201d because the data we\u2019ll be reading in comes in SAS format, \u201cgeepack\u201d to run our GEE model, and \u201clme4\u201d to run our multilevel model:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">library(tidyverse)<br \/>library(haven)<br \/>library(geepack)<br \/>library(lme4)<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a83fa8c elementor-widget elementor-widget-text-editor\" data-id=\"a83fa8c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"ee78\">Now let\u2019s do some data cleaning to get these data ready for modeling!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8328\">Data from Years 9 and 15 are housed in separate SAS files (identifiable by the .sas7bdat extension), so we have one code chunk to read in and clean each file. This cleaning has to be done separately because variable names and coding differ slightly between study years (see the\u00a0<a href=\"https:\/\/opr.princeton.edu\/archive\/restricted\/Default.aspx\" rel=\"noopener\">Data and Documentation<\/a>\u00a0page for codebooks).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9059\">There are hundreds of variables included in the datasets, so we first select those that will be used in our model and assign meaningful variable names that are consistent across data frames. Next, we filter the data to only include individuals with complete data for our variables of interest (the code below excludes individuals with missing data for these variables as well as those who refused to answer).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-158dcc4 elementor-widget elementor-widget-text-editor\" data-id=\"158dcc4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"6a04\">We then recode our variables in the standard 1 = \u201cyes\u201d, 0 = \u201cno\u201d format. For the \u201cfeel_sad\u201d variable, this also means dichotomizing a variable with 4 levels which represent varying degrees of sadness. We end up with a binary variable where 1 = \u201csad\u201d and 0 = \u201cnot sad.\u201d Some regression techniques can handle multiple levels in your response variable (MLM included), but I have binarized it here for simplicity. Finally, we create a \u201ctime_order\u201d variable indicating if the observation comes from the first or second round of the study.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">year_9 = read_sas(\".\/data\/FF_wave5_2020v2_SAS.sas7bdat\") %&gt;% <br \/>  select(idnum, k5g2g, k5f1l, k5f1j) %&gt;% <br \/>  rename(\"feel_sad\" = \"k5g2g\",<br \/>         \"tobacco\" = \"k5f1l\",<br \/>         \"alcohol\" = \"k5f1j\") %&gt;% <br \/>  filter(<br \/>    tobacco == 1 | tobacco == 2,<br \/>    alcohol == 1 | alcohol == 2,<br \/>    feel_sad == 0 | feel_sad == 1 | feel_sad == 2 | feel_sad == 3<br \/>  ) %&gt;% <br \/>  mutate(<br \/>    tobacco = ifelse(tobacco == 1, 1, 0),<br \/>    alcohol = ifelse(alcohol == 1, 1, 0),<br \/>    feel_sad = ifelse(feel_sad == 0, 0, 1),<br \/>    time_order = 1<br \/>  )year_15 = read_sas(\".\/data\/FF_wave6_2020v2_SAS.sas7bdat\") %&gt;% <br \/>  select(idnum, k6d2n, k6d40, k6d48) %&gt;% <br \/>  rename(\"feel_sad\" = \"k6d2n\",<br \/>         \"tobacco\" = \"k6d40\",<br \/>         \"alcohol\" = \"k6d48\") %&gt;% <br \/>  filter(<br \/>    tobacco == 1 | tobacco == 2,<br \/>    alcohol == 1 | alcohol == 2,<br \/>    feel_sad == 1 | feel_sad == 2 | feel_sad == 3 | feel_sad == 4<br \/>  ) %&gt;% <br \/>  mutate(<br \/>    tobacco = ifelse(tobacco == 1, 1, 0),<br \/>    alcohol = ifelse(alcohol == 1, 1, 0),<br \/>    feel_sad = ifelse(feel_sad == 4, 0, 1),<br \/>    time_order = 2<br \/>  )<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dc7e038 elementor-widget elementor-widget-text-editor\" data-id=\"dc7e038\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"42e4\">We then combine data from Years 9 and 15 by stacking our two cleaned data frames using rbind(). The rbind() function works well here because both data frames now share all variable names. We next transform the \u201cidnum\u201d variable (which identifies unique family units) into a numeric variable so that it can be properly used to sort the data in the final code chunk. This step is necessary because the geeglm() function that we will be using to run the GEE model assumes that the data frame is sorted first by a unique identifier (in this case, \u201cidnum\u201d), and next by the order of observations (indicated here by the new \u201ctime_order\u201d variable).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">fragile_families = rbind(year_9, year_15) %&gt;% <br \/>  mutate(<br \/>    idnum = as.numeric(idnum)<br \/>  )fragile_families = <br \/>  fragile_families[<br \/>  with(fragile_families, order(idnum)),<br \/>]<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-73adea8 elementor-widget elementor-widget-text-editor\" data-id=\"73adea8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"fefd\">The above code produces the following cleaned data frame, which is now ready to be used for regression modeling:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-3ab4e58 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"71663\" data-id=\"3ab4e58\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d8186bf\" data-eae-slider=\"50797\" data-id=\"d8186bf\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f7651b8 elementor-widget elementor-widget-image\" data-id=\"f7651b8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"698\" height=\"461\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_ZPU_DxEtBDPMAuE4PuXYeA.png\" class=\"attachment-large size-large wp-image-33797\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_ZPU_DxEtBDPMAuE4PuXYeA.png 698w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_ZPU_DxEtBDPMAuE4PuXYeA-300x198.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_ZPU_DxEtBDPMAuE4PuXYeA-610x403.png 610w\" sizes=\"(max-width: 698px) 100vw, 698px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5d9873a elementor-widget elementor-widget-text-editor\" data-id=\"5d9873a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"1f26\">Let\u2019s fit our models:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Simple Logistic Regression<\/strong><\/li>\n<\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-96048ed elementor-widget elementor-widget-text-editor\" data-id=\"96048ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n\n<p class=\"wp-block-paragraph\" id=\"b387\">First, we use the glm() function to fit a simple logistic regression model using the \u201cfragile_families\u201d data. Since we have a binary outcome variable, \u201cfamily = binomial\u201d is used to specify that logistic regression should be used. We also use tidy() from the \u201cbroom\u201d package to clean up the model output. We are creating this model for comparison purposes only \u2014 as indicated before, the independence assumption has been violated and the standard errors associated with this model will not be valid!<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">glm(formula = feel_sad ~ tobacco + alcohol, <br \/>    family = binomial, data = fragile_families) %&gt;% <br \/>  broom::tidy()<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-994194f elementor-widget elementor-widget-text-editor\" data-id=\"994194f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"502c\">The above code produces the following output, which the subsequent modeling approaches will be compared to. Tobacco and alcohol use both appear to be significant predictors of sadness in participating children.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6865f5e elementor-widget elementor-widget-image\" data-id=\"6865f5e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"121\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1024x121.png\" class=\"attachment-large size-large wp-image-33798\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1024x121.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-300x35.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-768x91.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-610x72.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-750x89.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1140x135.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q.png 1462w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec0bec9 elementor-widget elementor-widget-text-editor\" data-id=\"ec0bec9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"4e5f\"><strong>2. Generalized Estimating Equations<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-499c84a elementor-widget elementor-widget-text-editor\" data-id=\"499c84a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"252e\">The syntax used to specify a GEE model using the geeglm() function from the \u201cgeepack\u201d package is fairly similar to that used with the standard glm() function. The \u201cformula\u201d, \u201cfamily\u201d, and \u201cdata\u201d are arguments are exactly the same for both functions. What\u2019s new are the \u201cid,\u201d \u201cwaves,\u201d and \u201ccorstr\u201d arguments (see\u00a0<a href=\"https:\/\/cran.r-project.org\/web\/packages\/geepack\/geepack.pdf\" rel=\"noopener\">package documentation<\/a>\u00a0for all available arguments). The unique identifier that links observations from the same subject is specified in the \u201cid\u201d argument. In this case the ID is \u201cidnum,\u201d the unique identifier assigned to each family participating in the study. The \u201ctime_order\u201d variable created during data cleaning comes into play in the \u201cwaves\u201d argument, where it indicates the order in which observations were made. Finally, \u201ccorstr\u201d can be used to specify the within-subject correlation structure. \u201cIndependence\u201d is actually the default input for this argument, and it makes sense in this context because it is useful when clusters are small. However, \u201cexchangeable\u201d can be specified when all observations within a subject can be considered to be equally correlated, and \u201car1\u201d is best when the internal correlations change over time. Information on choosing the right correlation structure can be found\u00a0here<a href=\"https:\/\/online.stat.psu.edu\/stat504\/node\/181\/\" rel=\"noopener\">\u00a0<\/a>and\u00a0<a href=\"https:\/\/stats.stackexchange.com\/questions\/83577\/gee-choosing-proper-working-correlation-structure\" rel=\"noopener\">here<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">geeglm(formula = feel_sad ~ tobacco + alcohol, <br \/>       family = binomial, id = idnum, data = fragile_families, <br \/>       waves = time_order, corstr = \"independence\") %&gt;% <br \/>  broom::tidy()<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-38a3793 elementor-widget elementor-widget-text-editor\" data-id=\"38a3793\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"df6b\">Our GEE model gives us the following output:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-148ac10 elementor-widget elementor-widget-image\" data-id=\"148ac10\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"121\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-1024x121.png\" class=\"attachment-large size-large wp-image-33799\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-1024x121.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-300x35.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-768x91.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-610x72.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-750x89.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1-1140x135.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_rSYW1x1jb53Uyp51tdBk8Q-1.png 1462w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6ec9183 elementor-widget elementor-widget-text-editor\" data-id=\"6ec9183\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"a8f1\">As you can see, our beta estimates are exactly the same as those produced using glm(), but standard error differs slightly now that the correlations in the data have been accounted for. While tobacco and alcohol are still significant predictors of sadness, the p-values are somewhat different**. If these p-values were closer to 0.05, having accurate standard error measurements could easily push a p-value over or under the level of significance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b88b\">**Note: The test statistics for GEE and logistic regression look drastically different, but this is only because the test statistic provided in the logistic regression output is a Z-statistic and the test statistic provided in the GEE output is a Wald statistic. The Z-statistic is calculated by dividing the estimate by the standard error, while the Wald statistic is calculated by squaring the result of dividing the estimate by the standard error. The two values are therefore mathematically related, and by taking the square root of the values in the GEE \u201cstatistic\u201d column you will see a much more moderate change from the initial Z-statistics.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e6c873f elementor-widget elementor-widget-text-editor\" data-id=\"e6c873f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"dc95\">With the geeglm() function, it is also important to verify that your clusters have been properly recognized. You can do this by running the above code without the broom::tidy() step, so:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">geeglm(formula = feel_sad ~ tobacco + alcohol, <br \/>       family = binomial, id = idnum, data = fragile_families, <br \/>       waves = time_order, corstr = \"independence\")<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-961dca3 elementor-widget elementor-widget-text-editor\" data-id=\"961dca3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"d921\">This code produces the output shown below. You want to look to the last line of the output, where \u201cNumber of clusters\u201d and \u201cMaximum cluster size\u201d are described. We had 2 observations for several thousand individuals, so these values make sense in the context of our data and indicate that clusters were registered correctly by the function. If, however, the number of clusters is equal to the number of rows in your dataset, something is not working properly (most likely the sorting of your data is off).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d371c7c elementor-widget elementor-widget-image\" data-id=\"d371c7c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"506\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_Y0FoNMQsqtSNXMwL0SKf0w.png\" class=\"attachment-large size-large wp-image-33800\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_Y0FoNMQsqtSNXMwL0SKf0w.png 864w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_Y0FoNMQsqtSNXMwL0SKf0w-300x176.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_Y0FoNMQsqtSNXMwL0SKf0w-768x450.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_Y0FoNMQsqtSNXMwL0SKf0w-610x357.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_Y0FoNMQsqtSNXMwL0SKf0w-750x439.png 750w\" sizes=\"(max-width: 864px) 100vw, 864px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-7b56ad1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"52322\" data-id=\"7b56ad1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d37130c\" data-eae-slider=\"70315\" data-id=\"d37130c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f4f4265 elementor-widget elementor-widget-text-editor\" data-id=\"f4f4265\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"8185\"><strong>3. Multilevel Modeling<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9e45\">Next, let\u2019s fit a multilevel model using glmer() from the\u00a0<a href=\"https:\/\/www.rdocumentation.org\/packages\/lme4\/versions\/1.1-23\" rel=\"noopener\">lme4 package<\/a>. Again, the required code is almost identical to that used for logistic regression. The only required change is specifying random slopes and intercepts in the formula argument. This is done with the \u201c(1 | idnum)\u201d bit of code, which follows the following structure: (random slopes | random intercepts). The grouping variable, in this case \u201cidnum,\u201d is specified to the right of the | as \u201crandom intercepts,\u201d and the \u201c1\u201d indicates that we don\u2019t want the predictors\u2019 effects to vary across groups. A\u00a0<a href=\"http:\/\/www.rensenieuwenhuis.nl\/r-sessions-16-multilevel-model-specification-lme4\/\" rel=\"noopener\">useful blog post<\/a>\u00a0by Rense Nieuwenhuis provides various examples of this glmer() syntax.<\/p>\n\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f4222d9 elementor-widget elementor-widget-text-editor\" data-id=\"f4222d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"769a\">The lme4 package is not compatible with the broom package, so instead we pull the model\u2019s coefficients after creating a list with a summary of the model\u2019s output.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mlm = summary(glmer(formula = <br \/>                    feel_sad ~ tobacco + alcohol + (1 | idnum), <br \/>                    data = fragile_families, family = binomial))mlm$coefficients<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-09b33ae elementor-widget elementor-widget-text-editor\" data-id=\"09b33ae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"3e06\">Again, the output is similar to that of the simple logistic regression model, and both tobacco and alcohol use are still significant predictors of sadness. Estimates vary slightly from those produced using the glm() and geeglm() functions because groupings in the data are no longer ignored or treated as an annoyance to be addressed by correcting standard error; instead, they are now incorporated as an important part of the model. Standard error estimates are higher for all estimates in comparison to those produced through logistic regression, and Z- and p-values remain similar but reflect these important changes in the estimate and standard error values.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1593a6f elementor-widget elementor-widget-image\" data-id=\"1593a6f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"794\" height=\"149\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_YkMrkmzjSna-d8tM9l5yqw.png\" class=\"attachment-large size-large wp-image-33801\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_YkMrkmzjSna-d8tM9l5yqw.png 794w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_YkMrkmzjSna-d8tM9l5yqw-300x56.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_YkMrkmzjSna-d8tM9l5yqw-768x144.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_YkMrkmzjSna-d8tM9l5yqw-610x114.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_YkMrkmzjSna-d8tM9l5yqw-750x141.png 750w\" sizes=\"(max-width: 794px) 100vw, 794px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-408fb9b elementor-widget elementor-widget-heading\" data-id=\"408fb9b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Example 2: Data from the Global School-Based Student Health Survey (GSHS)<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d0bbfde elementor-widget elementor-widget-text-editor\" data-id=\"d0bbfde\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"2445\">The second dataset that we will walk through comes from the WHO\u2019s\u00a0<a href=\"https:\/\/www.cdc.gov\/gshs\/index.htm\" rel=\"noopener\">Global School-Based Student Health Survey<\/a>\u00a0(GSHS). This survey is conducted among schoolchildren aged 13\u201317 with the goals of helping countries to determine health priorities, establishing the prevalences of health-related behaviors, and facilitating direct comparison of these prevalences across nations. We will be using data from two countries,\u00a0<a href=\"https:\/\/www.cdc.gov\/gshs\/countries\/seasian\/indonesia.htm\" rel=\"noopener\">Indonesia<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.cdc.gov\/gshs\/countries\/seasian\/bangladesh.htm\" rel=\"noopener\">Bangladesh<\/a>, which can be downloaded directly from these countries\u2019 respective descriptive pages.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"7fee\">The data are cross-sectional: an identical survey was conducted one time among schoolchildren in both nations. I am interested in using variables from this dataset to describe the relationship between whether or not a child has friends, whether or not the child is bullied (my predictors) and whether or not the child has seriously contemplated suicide (my outcome). It is likely that these relationships differ between the two countries and that children are more similar to other children from the same country. Therefore, knowing whether a child is from Indonesia or Bangladesh provides important information about that child\u2019s responses and the assumption of independent observations is violated.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-15e2470 elementor-widget elementor-widget-text-editor\" data-id=\"15e2470\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"6e34\">Let\u2019s load packages again:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">library(tidyverse)<br \/>library(haven)<br \/>library(lme4)<br \/>library(gee)<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8606\">Note that the \u201cgeepack\u201d package has been replaced with the \u201cgee\u201d package. The \u201cgee\u201d package is easier to use (in my opinion) with data that is clustered by a grouping variable such as country rather than within an individual who has multiple observations.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-df0c143 elementor-widget elementor-widget-text-editor\" data-id=\"df0c143\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"eb04\">Next, let\u2019s load in the data (which is also in SAS format, so we use the \u201chaven\u201d package again) and conduct some basic cleaning. Data cleaning here follows a similar structure to the procedure used with the Fragile Families &amp; Child Wellbeing Study data: important variables are selected and assigned meaningful, consistent names, and a new variable is created to indicate which cluster an observation belongs to (in this case the new \u201ccountry\u201d variable).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">indonesia = read_sas(\".\/data\/IOH2007_public_use.sas7bdat\") %&gt;% <br \/>  select(q21, q25, q27) %&gt;% <br \/>  rename(<br \/>    \"bullied\" = \"q21\",<br \/>    \"suicidal_thoughts\" = \"q25\",<br \/>    \"friends\" = \"q27\"<br \/>  ) %&gt;% <br \/>  mutate(<br \/>    country = 1,<br \/>  )bangladesh = read_sas(\".\/data\/bdh2014_public_use.sas7bdat\") %&gt;% <br \/>  select(q20, q24, q27) %&gt;% <br \/>  rename(<br \/>    \"bullied\" = \"q20\",<br \/>    \"suicidal_thoughts\" = \"q24\",<br \/>    \"friends\" = \"q27\"<br \/>  ) %&gt;% <br \/>  mutate(<br \/>    country = 2<br \/>  )<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cd47ac0 elementor-widget elementor-widget-text-editor\" data-id=\"cd47ac0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"36b0\">Again, the two data frames are stacked together. Since variables were coded consistently during collection in both countries, some cleaning can be conducted only once using this combined dataset. Missing data is eliminated, and all variables are converted from string format to numeric. Finally, variables are mutated into a consistent, binarized format.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:preformatted -->\n<pre class=\"wp-block-preformatted\">survey = rbind(indonesia, bangladesh) %&gt;% <br \/>  mutate(<br \/>    suicidal_thoughts = as.numeric(suicidal_thoughts),<br \/>    friends = as.numeric(friends),<br \/>    bullied = as.numeric(bullied),<br \/>    suicidal_thoughts = ifelse(suicidal_thoughts == 1, 1, 0),<br \/>    friends = ifelse(friends == 1, 0, 1),<br \/>    bullied = ifelse(bullied == 1, 0, 1)<br \/>  ) %&gt;% <br \/>  drop_na()<\/pre>\n<!-- \/wp:preformatted -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-effd59f elementor-widget elementor-widget-text-editor\" data-id=\"effd59f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"7e33\">Our cleaned data frame now looks like this:<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-75e7b61 elementor-widget elementor-widget-image\" data-id=\"75e7b61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"632\" height=\"461\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_UkLhOyPoqhICfn907fFUUw.png\" class=\"attachment-large size-large wp-image-33802\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_UkLhOyPoqhICfn907fFUUw.png 632w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_UkLhOyPoqhICfn907fFUUw-300x219.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_UkLhOyPoqhICfn907fFUUw-610x445.png 610w\" sizes=\"(max-width: 632px) 100vw, 632px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d5cb422 elementor-widget elementor-widget-text-editor\" data-id=\"d5cb422\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3d59\">Let\u2019s fit our models:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol>\n<li><strong>Simple Logistic Regression<\/strong><\/li>\n<\/ol>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p id=\"bf42\">With the exception of variable names and the data specified, the glm() code remains identical to that used with the Fragile Families study data.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:preformatted -->\n<pre class=\"wp-block-preformatted\">glm(formula = suicidal_thoughts ~ bullied + friends, <br \/>    family = binomial, data = survey) %&gt;% <br \/>  broom::tidy()<\/pre>\n<!-- \/wp:preformatted -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-387b1cb elementor-widget elementor-widget-text-editor\" data-id=\"387b1cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"cb12\">Unsurprisingly, whether or not a child has friends and whether or not a child is bullied are both significant predictors of the presence of suicidal thoughts in this sample.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-273a1be elementor-widget elementor-widget-image\" data-id=\"273a1be\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"130\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-1024x130.png\" class=\"attachment-large size-large wp-image-33803\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-1024x130.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-300x38.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-768x97.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-610x77.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-750x95.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ-1140x144.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_n_jDjhEJrchNkNrX5oKmnQ.png 1492w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d5ab0ff elementor-widget elementor-widget-heading\" data-id=\"d5ab0ff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">2. Generalized Estimating Equations<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-73b562e elementor-widget elementor-widget-text-editor\" data-id=\"73b562e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe gee() function in the gee package allows us to easily use GEE with our survey data. This function is a better fit than the previously used geeglm() function as data are not correlated over time, but rather by a separate variable that can be indicated with the \u201cid\u201d argument (in this case, \u201ccountry\u201d). The formula and family arguments remain identical to those used with the glm() function, and the \u201ccorstr\u201d argument used with the geeglm() function is the same here as well. However, unlike the geepack package, the gee package is not compatible with the broom::tidy() function so output is viewed using the summary() function instead.\n\n<!-- wp:preformatted -->\n<pre class=\"wp-block-preformatted\">gee = gee(suicidal_thoughts ~ bullied + friends, data = survey, <br \/>          id = country, family = binomial, <br \/>          corstr = \"exchangeable\")summary(gee)<\/pre>\n<!-- \/wp:preformatted -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-decaa6f elementor-widget elementor-widget-text-editor\" data-id=\"decaa6f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"f15f\">One of the reasons that I particularly like the gee() function is that the naive standard error and Z-test statistics are actually included in the output (naive meaning that these values are produced by regression where clustering is not accounted for \u2014 you\u2019ll see that these are exactly the same as those produced by the glm() function above). You\u2019ll notice drastic changes in the standard errors and Z-test statistics produced using GEE (\u201cRobust\u201d), although both of our predictors remain significant. It appears that accounting for within-country correlation has allowed for much lower standard errors to be used.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d438be2 elementor-widget elementor-widget-image\" data-id=\"d438be2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"943\" height=\"781\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_BhpUh95BxjaxnDwOD7Dilw.png\" class=\"attachment-large size-large wp-image-33804\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_BhpUh95BxjaxnDwOD7Dilw.png 943w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_BhpUh95BxjaxnDwOD7Dilw-300x248.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_BhpUh95BxjaxnDwOD7Dilw-768x636.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_BhpUh95BxjaxnDwOD7Dilw-610x505.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_BhpUh95BxjaxnDwOD7Dilw-750x621.png 750w\" sizes=\"(max-width: 943px) 100vw, 943px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e2c6d0e elementor-widget elementor-widget-text-editor\" data-id=\"e2c6d0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"c1a6\"><strong>3. Multilevel Modeling***<\/strong><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"b794\">***Note: As noted above, models are for demonstration purposes only and are not necessarily valid. In this case, we would want more groups than two for our MLM model (meaning data from additional countries). If you are really only using two groups with MLM models, you should consider a\u00a0<a href=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/00273171.2017.1344538?journalCode=hmbr20\" rel=\"noopener\">small sample size correction<\/a>.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"4453\">Finally, we try MLM with the survey dataset. The code is exactly the same as that used with the Fragile Families study data, but with the new formula, grouping variable, and dataset specified.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:preformatted -->\n<pre class=\"wp-block-preformatted\">mlm = summary(glmer(formula = <br \/>                    suicidal_thoughts ~ bullied + friends + <br \/>                    (1 | country), <br \/>                    data = survey, family = binomial))mlm$coefficients<\/pre>\n<!-- \/wp:preformatted -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a52d7be elementor-widget elementor-widget-text-editor\" data-id=\"a52d7be\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"9a6a\">Again, beta estimates and standard error estimates are now adjusted slightly from those produced using glm(). Z- and p-values associated with the \u201cbullied\u201d and \u201cfriends\u201d variables are slightly smaller, although bullying and having friends remain significant predictors of suicidal thoughts.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9ebc96b elementor-widget elementor-widget-image\" data-id=\"9ebc96b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"772\" height=\"155\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_5y5bvLUYQNLAf9oUp-1t1A.png\" class=\"attachment-large size-large wp-image-33805\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_5y5bvLUYQNLAf9oUp-1t1A.png 772w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_5y5bvLUYQNLAf9oUp-1t1A-300x60.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_5y5bvLUYQNLAf9oUp-1t1A-768x154.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_5y5bvLUYQNLAf9oUp-1t1A-610x122.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/1_5y5bvLUYQNLAf9oUp-1t1A-750x151.png 750w\" sizes=\"(max-width: 772px) 100vw, 772px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-96dbc0f elementor-widget elementor-widget-heading\" data-id=\"96dbc0f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\">Which model is best for these examples?<\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-cbd5d31 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"4291\" data-id=\"cbd5d31\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4bf08f6\" data-eae-slider=\"60317\" data-id=\"4bf08f6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-567cae4 elementor-widget elementor-widget-text-editor\" data-id=\"567cae4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"9f21\">Data from Princeton University\u2019s Fragile Families &amp; Child Wellbeing Study would be best represented using GEE. This is due to the maximum cluster size of 2 observations, the fact that individual families have multiple data points over time, and the fact that we were more interested in accounting for grouping in the standard error estimates than actually assessing differences between families.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"03d9\">Multilevel modeling is most appropriate for data from the Global School-Based Student Health Survey (GSHS) because the data were collected cross-sectionally and can be divided into two large clusters. Additionally, the output could be further explored to determine both within- and between-group variances, and we might be interested in relationships both within and across countries.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"b141\">How you account for violations of the independent observations assumption will depend on the structure of your data and your general knowledge of the data collection process, as well as whether or not you consider the correlation to be an annoyance to adjust for or something meaningful to explore.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"bdfa\"><strong>In conclusion, regression is flexible and certain regression models can handle correlated data.\u00a0<\/strong>However, it is always important to check the assumptions of a given technique and to make sure that your analytic strategy is appropriate for your data.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Regression is flexible and certain regression models can handle correlated data. However, it is always important to check the assumptions of a given technique and to make sure that your analytic strategy is appropriate for your data. <\/p>\n","protected":false},"author":924,"featured_media":9876,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187],"tags":[671,670,669],"ppma_author":[3685],"class_list":["post-9875","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-correlated-data","tag-models","tag-regression"],"authors":[{"term_id":3685,"user_id":924,"is_guest":0,"slug":"emily-halford","display_name":"Emily Halford","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/Emily-Halford-150x150.jpg","author_category":"","user_url":"http:\/\/nysi.org","last_name":"Halford","first_name":"Emily","job_title":"","description":"Emily Halford is Data Analyst at New York State Psychiatric Institute."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9875","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/924"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=9875"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9875\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/9876"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=9875"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=9875"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=9875"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9875"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}