DataAnalytics/2_multiple_linear_regression.Rmd at main · course-files/DataAnalytics · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
---
title: "Multiple Linear Regression"
author: "Allan Omondi"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_depth: 4
    number_sections: true
    fig_width: 5
    fig_height: 5
    self_contained: false
    keep_md: true
  word_document:
    toc: true
    toc_depth: 4
    number_sections: true
    fig_width: 5
    keep_md: true
  html_notebook:
    toc: true
    toc_depth: 4
    number_sections: true
    fig_width: 5
    self_contained: false
  pdf_document:
    toc: true
    toc_depth: 4
    number_sections: true
    fig_width: 6
    fig_height: 6
    fig_crop: false
    keep_tex: true
    latex_engine: xelatex
---

```{r setup_chunk, message=FALSE, warning=FALSE}
knitr::opts_chunk$set(echo = TRUE)
if (!"pacman" %in% installed.packages()[, "Package"]) {
  install.packages("pacman", dependencies = TRUE)
  library("pacman", character.only = TRUE)
}

pacman::p_load("here")

knitr::opts_knit$set(root.dir = here::here())
```

# Load the Dataset

```{r load_dataset, echo=TRUE, message=FALSE, warning=FALSE}
pacman::p_load("readr")

advertising_data <- read_csv("./data/sme_socialmedia_advertising_kenya.csv")
head(advertising_data)
```

# Initial EDA

[**View the Dimensions**]{.underline}

The number of observations and the number of variables.

```{r show_dimensions, echo=TRUE, message=FALSE, warning=FALSE}
dim(advertising_data)
```

[**View the Data Types**]{.underline}

```{r show_data_types_1, echo=TRUE, message=FALSE, warning=FALSE}
sapply(advertising_data, class)
```

```{r show_data_types_2, echo=TRUE, message=FALSE, warning=FALSE}
str(advertising_data)
```

[**Descriptive Statistics**]{.underline}

Understanding your data can lead to:

-   **Data cleaning:** To remove extreme outliers or impute missing data.

-   **Data transformation:** To reduce skewness

-   **Hypothesis formulation:** Formulate a hypothesis based on the patterns you identify

-   **Choosing the appropriate statistical test:** You may notice properties of the data such as distributions or data types that suggest the use of parametric or non-parametric statistical tests and algorithms

Descriptive statistics can be used to understand your data. Typical descriptive statistics include:

1.  **Measures of frequency:** count and percent

2.  **Measures of central tendency:** mean, median, and mode

3.  **Measures of distribution/dispersion/spread/scatter/variability:** minimum, quartiles, maximum, variance, standard deviation, coefficient of variation, range, interquartile range (IQR) [includes a box and whisker plot for visualization], kurtosis, skewness [includes a histogram for visualization]).

4.  **Measures of relationship:** covariance and correlation

## [**Measures of Frequency**]{.underline}

This is applicable in cases where you have categorical variables, e.g., 60% of the observations are male and 40% are female (2 categories).

## [**Measures of Central Tendency**]{.underline}

The median and the mean of each numeric variable:

```{r central_tendency, echo=TRUE, message=FALSE, warning=FALSE}
summary(advertising_data)
```

The first 5 rows in the dataset:

```{r first_five, echo=TRUE, message=FALSE, warning=FALSE}
head(advertising_data, 5)
```

The last 5 rows in the dataset:

```{r last_five, echo=TRUE, message=FALSE, warning=FALSE}
tail(advertising_data, 5)
```

## [**Measures of Distribution**]{.underline}

Measuring the variability in the dataset is important because the amount of variability determines **how well you can generalize** results from the sample to a new observation in the population.

Low variability is ideal because it means that you can better predict information about the population based on the sample data. High variability means that the values are less consistent, thus making it harder to make predictions.

The syntax `dataset[rows, columns]` can be used to specify the exact rows and columns to be considered. `dataset[, columns]` implies all rows will be considered. For example, specifying `BostonHousing[, -4]` implies all the columns except column number 4. This can also be stated as `BostonHousing[, c(1,2,3,5,6,7,8,9,10,11,12,13,14)]`. This allows us to perform calculations on only columns that are numeric, thus leaving out the columns termed as “factors” (categorical) or those that have a string data type.

### **Variance**

```{r distribution_variance, echo=TRUE, message=FALSE, warning=FALSE}
sapply(advertising_data[,], var)
```

### **Standard Deviation**

```{r distribution_standard_deviation, echo=TRUE, message=FALSE, warning=FALSE}
sapply(advertising_data[,], sd)
```

### **Kurtosis (Pearson)**

The Kurtosis informs us of how often outliers occur in the results. There are different formulas for calculating kurtosis. Specifying “type = 2” allows us to use the 2nd formula which is the same kurtosis formula used in other statistical software like SPSS and SAS. It is referred to as "Pearson's definition of kurtosis".

In “type = 2” (used in SPSS and SAS):

1.  Kurtosis \< 3 implies a low number of outliers → platykurtic

2.  Kurtosis = 3 implies a medium number of outliers → mesokurtic

3.  Kurtosis \> 3 implies a high number of outliers → leptokurtic

High kurtosis (leptokurtic) affects models that are sensitive to outliers. Estimates of the variance are also inflated. Low kurtosis (platykurtic) implies a possible underestimation of real-world variability. The typical remedy includes trimming outliers or using robust statistical methods that are less affected by outliers.

```{r distribution_kurtosis, echo=TRUE, message=FALSE, warning=FALSE}
pacman::p_load("e1071")
sapply(advertising_data[,],  kurtosis, type = 2)
```

### **Skewness**

The skewness is used to identify the asymmetry of the distribution of results. Similar to kurtosis, there are several ways of computing the skewness.

Using “type = 2” (common in other statistical software like SPSS and SAS) can be interpreted as:

1.  Skewness between -0.4 and 0.4 (inclusive) implies that there is no skew in the distribution of results; the distribution of results is symmetrical; it is a normal distribution; a Gaussian distribution.

2.  Skewness above 0.4 implies a positive skew; a right-skewed distribution.

3.  Skewness below -0.4 implies a negative skew; a left-skewed distribution.

Skewed data results in misleading averages and potentially biased model coefficients. The typical remedy to skewed data involves applying data transformations such as logarithmic, square-root, or Box–Cox, etc. to reduce skewness.

```{r distribution_skewness, echo=TRUE, message=FALSE, warning=FALSE}
sapply(advertising_data[,], skewness, type = 2)
```

As a data analyst, you need to confirm if the distortion in kurtosis or skewness is a data problem or it is a real-world insight. For example, a real-world insight could be that the sales were exceptionally high because of a viral marketing campaign.

## [**Measures of Relationship**]{.underline}

### **Covariance**

Covariance is a statistical measure that indicates the direction of the linear relationship between two variables. It assesses whether increases in one variable correspond to increases or decreases in another.

-   **Positive Covariance:** When one variable increases, the other tends to increase as well.

-   **Negative Covariance:** When one variable increases, the other tends to decrease.

-   **Zero Covariance:** No linear relationship exists between the variables.

While covariance indicates the direction of a relationship, it does not convey the strength or consistency of the relationship. The correlation coefficient is used to indicate the strength of the relationship.

```{r distribution_covariance, echo=TRUE, message=FALSE, warning=FALSE}
cov(advertising_data, method = "spearman")
```

### **Correlation**

A strong correlation between variables enables us to better predict the value of the dependent variable using the value of the independent variable. However, a weak correlation between two variables does not help us to predict the value of the dependent variable from the value of the independent variable. This is useful only if there is a linear association between the variables.

We can measure the statistical significance of the correlation using Spearman's rank correlation *rho*. This shows us if the variables are significantly monotonically related. A monotonic relationship between two variables implies that as one variable increases, the other variable either consistently increases or consistently decreases. The key characteristic is the preservation of the direction of change, though the rate of change may vary.

**Option 1:** Conduct a correlation test between the dependent variable and each independent variable one at a time.

```{r distribution_correlation_1, echo=TRUE, message=FALSE, warning=FALSE}
cor.test(advertising_data$Sales, advertising_data$YouTube, method = "spearman")

cor.test(advertising_data$Sales, advertising_data$TikTok, method = "spearman")

cor.test(advertising_data$Sales, advertising_data$Facebook, method = "spearman")
```

**Option 2:** To view the correlation of all variables at the same time

```{r distribution_correlation_2, echo=TRUE, message=FALSE, warning=FALSE}
cor(advertising_data, method = "spearman")
```

## [**Basic Visualizations**]{.underline}

### **Histogram**

```{r visualization_histogram, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
par(mfrow = c(1, 2))
for (i in 1:4) {
  if (is.numeric(advertising_data[[i]])) {
    hist(advertising_data[[i]],
         main = names(advertising_data)[i],
         xlab = names(advertising_data)[i])
  } else {
    message(paste("Column", names(advertising_data)[i], "is not numeric and will be skipped."))
  }
}
```

### **Box and Whisker Plot**

```{r visualization_boxplot, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
par(mfrow = c(1, 2))
for (i in 1:4) {
  if (is.numeric(advertising_data[[i]])) {
    boxplot(advertising_data[[i]], main = names(advertising_data)[i])
  } else {
    message(paste("Column", names(advertising_data)[i], "is not numeric and will be skipped."))
  }
}
```

### **Missing Data Plot**

```{r missing_data_plot, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
pacman::p_load("Amelia")

missmap(advertising_data, col = c("red", "grey"), legend = TRUE)
```

### **Correlation Plot**

```{r correlation_plot, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
pacman::p_load("ggcorrplot")

ggcorrplot(cor(advertising_data[,]))
```

### **Scatter Plot**

```{r scatter_plot_1, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
pacman::p_load("corrplot")

pairs(advertising_data$Sales ~ ., data = advertising_data)
```

```{r scatter_plot_2, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
pacman::p_load("ggplot2")
ggplot(advertising_data,
       aes(x = YouTube, y = Sales)) +
  geom_point() +
  geom_smooth(method = lm) +
  labs(
    title = "Relationship between Sales Revenue and \nExpenditure on YouTube Marketing (KES)",
    x = "Expenditure",
    y = "Sales"
  )
```

```{r scatter_plot_3, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
pacman::p_load("dplyr")
advertising_data_composite <- advertising_data %>%
  mutate(Total_Expenditure = YouTube + TikTok + Facebook)

ggplot(advertising_data_composite,
       aes(x = Total_Expenditure, y = Sales)) +
  geom_point() +
  geom_smooth(method = lm) +
  labs(
    title = "Relationship between Sales Revenue and \nTotal Marketing Expenditure (KES)",
    x = "Total Marketing Expenditure",
    y = "Sales"
  )
```

# Statistical Test

We then apply a simultaneous multiple linear regression as a statistical test for regression. The term "simultaneous" refers to how the predictor variables are entered and considered in the statistical test. It means that all the predictor variables included in the model are entered and evaluated at the same time.

```{r statistical_test_SLR, echo=TRUE, message=FALSE, warning=FALSE}
mlr_test <- lm(Sales ~ YouTube + TikTok + Facebook, data = advertising_data)
```

View the summary of the model.

```{r statistical_test_result, echo=TRUE, message=FALSE, warning=FALSE}
summary(mlr_test)
```

To obtain a 95% confidence interval:

```{r 95_confidence_interval, echo=TRUE, message=FALSE, warning=FALSE}
confint(mlr_test, level = 0.95)
```

# Diagnostic EDA

## [**Test of Linearity**]{.underline}

For the model to pass the test of linearity, there should be no pattern in the distribution of residuals and the residuals should be randomly placed around the 0.0 residual line, i.e., the residuals should randomly vary around the mean of the value of the response variable.

```{r test_of_linearity, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
plot(mlr_test, which = 1)
```

## [**Test of Independence of Errors (Autocorrelation)**]{.underline}

This test is necessary to confirm that each observation is independent of the other. It helps to identify autocorrelation that is introduced when the data is collected over a close period of time or when one observation is related to another observation. Autocorrelation leads to underestimated standard errors and inflated t-statistics. It can also make findings appear more significant than they actually are. The "Durbin-Watson Test" can be used as a test of independence of errors (test of autocorrelation). A Durbin-Watson statistic close to 2 suggests no autocorrelation, while values approaching 0 or 4 indicate positive or negative autocorrelation, respectively.

For the Durbin-Watson test:

-   The null hypothesis, H~0~, is that there is no autocorrelation (no autocorrelation = there is no correlation between residuals across time or across observations).

-   The alternative hypothesis, H~a~, is that there is autocorrelation (autocorrelation = there is a correlation between residuals across time or across observations)

If the p-value of the Durbin-Watson statistic is greater than 0.05 then there is no evidence to reject the null hypothesis that "there is no autocorrelation".

```{r test_of_independence_of_errors, echo=TRUE, message=FALSE, warning=FALSE}
pacman::p_load("lmtest")
dwtest(mlr_test)
```

With a Durbin-Watson test statistic of 0.76 and a *p* \< 0.05 in this case, we reject the null hypothesis that states that there is no autocorrelation. In other words, we conclude that the data contains autocorrelation.

## [**Test of Normality**]{.underline}

The test of normality of the distribution of the errors assesses whether the errors (residuals) are approximately normally distributed, i.e., most errors are close to zero and large errors are rare. A Q-Q plot can be used to conduct the test of normality.

A Q-Q plot is a scatterplot of the quantiles of the errors against the quantiles of a normal distribution. Quantiles are statistical values that divide a dataset or probability distribution into equal-sized intervals. They help in understanding how data is distributed by marking specific points that separate the data into groups of equal size. Examples of quantiles include: quartiles (4 equal parts), percentiles (100 equal parts), deciles (10 equal parts), etc.

If the points in the Q-Q plot fall along a straight line, then the normality assumption is satisfied. If the points in the Q-Q plot do not fall along a straight line, then the normality assumption is not satisfied.

```{r test_of_normality, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
plot(mlr_test, which = 2)
```

## [**Test of Homoscedasticity**]{.underline}

Homoscedasticity requires that the spread of residuals should be constant across all levels of the independent variable. A scale-location plot (a.k.a. spread-location plot) can be used to conduct a test of homoscedasticity.

The x-axis shows the fitted (predicted) values from the model and the y-axis shows the square root of the standardized residuals. The red line is added to help visualize any patterns.

In a model with homoscedastic errors (equal variance across all predicted values):

-   Points should be randomly scattered around a horizontal line

-   The smooth line should be approximately horizontal

-   The vertical spread of points should be roughly equal across all fitted values

-   No obvious patterns, funnels, or trends should be visible

Points forming a cone shape that widens from left to right suggests heteroscedasticity with increasing variance for larger fitted values.

```{r test_of_homoscedasticity, echo=TRUE, fig.width=6, message=FALSE, warning=FALSE}
plot(mlr_test, which = 3)
```

**Breusch-Pagan Test**

The Breusch-Pagan Test can also be used in addition to the visual inspection of a Scale-Location plot.

Formally:

-   Null hypothesis (H₀): The residuals are homoscedastic (equal variance).

-   Alternative hypothesis (H₁): The residuals are heteroscedastic (non-constant variance).

p-Value:

-   p-value ≥ 0.05: Fail to reject H₀ → no evidence of heteroscedasticity → good, model passes.

-   p-value \< 0.05: Reject H₀ → evidence of heteroscedasticity → bad, model fails.

Interpretation: If the p-value is less than 0.05, then we reject the null hypothesis that states that “the residuals are homoscedastic”

With a p-value \< 0.01, there is statistically significant evidence of heteroscedasticity in the residuals in this case (which is bad).

```{r Breusch-PaganTest, echo=TRUE, message=FALSE, warning=FALSE}
pacman::p_load("lmtest")
lmtest::bptest(mlr_test)
```

## [**Quantitative Validation of Assumptions**]{.underline}

The graphical representations of the various tests of assumptions should be accompanied by quantitative values. The `gvlma` package (Global Validation of Linear Models Assumptions) is useful for this purpose.

```{r QuantitativeValidationofAssumptions, echo=TRUE, message=FALSE, warning=FALSE, message=FALSE}
pacman::p_load("gvlma")
gvlma_results <- gvlma(mlr_test)
summary(gvlma_results)
```

## Test of Multicollinearity

Multicollinearity arises when two or more independent variables (predictors) are highly intercorrelated. The **Variance Inflation Factor (VIF)** quantifies how much the variance of a coefficient estimate is “inflated” due to multicollinearity.

A VIF of 1 indicates no collinearity; values above 5 suggest problematic levels of collinearity. High VIF values (VIF \> 5) suggest that the coefficient estimates are less reliable due to the correlations between predictors.

```{r multicollinearity, echo=TRUE, message=FALSE, warning=FALSE}
pacman::p_load("car")
vif(mlr_test)
```

# Interpretation of the Results

We can interpret the results of the statistical test with more confidence if the tests of assumptions are successful. The presentation of the results and its subsequent interpretation is based on the following notes.

**t-Statistic t(d.f.):** It quantifies how many standard errors the estimated coefficient deviates from zero. A larger t-value (e.g., \>2) indicates stronger evidence against the null hypothesis (i.e., that the coefficient is zero). The t-statistic has its corresponding p-value such that a p-value \< .05 implies a statistically significant t-statistic.

**Degrees of Freedom (d.f.):** Degrees of freedom refers to the number of values in a calculation that are free to vary. It is essentially a measure of how much independent information is available for estimating a statistical parameter.

For example: Imagine you need to calculate the average height of 5 people, and you know the sum of all their heights is 340 inches. If you know the heights of 4 of these people (65, 70, 68, and 72 inches), you can automatically determine the height of the fifth person without measuring them: 340 - (65 + 70 + 68 + 72) = 65 inches In this example, even though there are 5 people, you only have 4 degrees of freedom because once you know 4 heights and the total, the 5th height is no longer “free to vary” – it is determined by the other values.

**F-Statistic**

**F(d.f. in numerator, d.f. in denominator):** The numerator degree of freedom corresponds to the number of predictor variables, while the denominator degree of freedom is derived from the total number of observations minus the number of predictors and then minus 1 for the intercept.

The F-test in regression evaluates whether the variance explained by the model is significantly greater than the unexplained variance (error). Think of the F-statistic as a ratio of “signal” (useful prediction) to “noise” (unexplained variation). The higher this ratio, the more confident you can be that your model is capturing something real. The larger the F-Statistic, the better the model’s performance.

Also, a low p-value of the F-statistic (any p-value \< .05 is considered low) indicates that the overall regression model **is statistically significant**.

**Coefficient of Determination (R^2^)**

The R-squared value represents the proportion of the total variation in the dependent variable that can be attributed to or explained by the independent variable. An R-squared of 0.96 indicates that approximately 96% of the variability in the dependent variable can be explained by its linear relationship with the independent variable. An R-squared value approaching 1 signifies that the regression line closely aligns with the observed data points.

**Multiple R-squared:** Measures the proportion of variance in the dependent variable explained by the independent variable (e.g., Multiple R^2^ = 0.6 means 60% of sales variance is explained by advertisement expenditure). The multiple R-squared value always increases (or at least never decreases) when you add more independent variables.

**Adjusted R-squared:** Also measures the proportion of variance in the dependent variable explained by the independent variable, however, it introduces a penalty based on the number of independent variables relative to the sample size.

The difference between multiple R-squared and adjusted R-squared is negligible in cases where there is only 1 independent variable.

**Residual Standard Error**

The residual standard error quantifies the average magnitude of the errors (residuals), which are the discrepancies between the observed values in the dataset and the values predicted by the regression model. It represents the standard deviation of the data points around the regression line. For example, a residual standard error of 7.73 indicates that, on average, the model's predicted value of the dependent variable deviates from the actual observed value by approximately 7.73 units.

A smaller residual standard error implies that the data points are more tightly clustered around the regression line, indicating a more precise model.

**Confidence Interval**

A 95% confidence interval (CI) for a parameter—such as a regression coefficient—provides a range that, under repeated sampling, would contain the true (but unknown) population parameter 95% of the time. Analogy: Imagine shooting arrows at a target. If you drew a circle around where 95% of your arrows landed, that circle is like a confidence interval—it captures the region in which your “shots” (i.e., estimates from different samples) tend to fall.

**Uncertainty quantification:** A CI communicates your estimate’s precision—narrower intervals imply more precise estimates (often due to larger samples or less variability), whereas wider intervals indicate greater uncertainty about the true value.

## Limitations and Diagnostic Findings

**Methodological Limitation:** Lack of experimental variation in advertisement expenditure limits causal attribution to any single platform.

**Diagnostic Findings:** The regression residuals exhibited evidence of heteroscedasticity, as indicated by a significant Breusch–Pagan test, $\chi^2$(3) = 304.78, p \< .001. Additionally, the Durbin–Watson test revealed evidence of positive autocorrelation in the residuals, *D* = 0.76, *p* \< .01. These results suggest that the assumptions of constant error variance and independence were not fully met, thereby limiting the validity of the standard error estimates and the inferential tests (e.g., t-tests and F-tests).

## Academic Statement (APA)—Academic-Ready Language: Without data transformation

A simultaneous multiple linear regression analysis was conducted on data from 250,000 observations (*N*=250,000) to examine whether advertising expenditures on YouTube, TikTok, and Facebook collectively predict Sales. The results indicated that expenses on YouTube ($\beta$ = 8.08, 95% CI [8.02, 8.13], SE = 0.02, *t*(249,996) = 277.54, *p* \< 0.01), TikTok ($\beta$ = 7.30, 95% CI [7.22, 7.39], SE = 0.04, *t*(249,996) = 165.63, *p* \< 0.01), and Facebook ($\beta$ = 7.84, 95% CI [7.77, 7.91], SE = 0.35, *t*(249,996) = 222.12, *p* \< 0.01) significantly predicted Sales (all large t-values and *p* \< 0.05). The model explained 38.06% of the variance in Sales (Multiple R^2^ = 0.38, Adjusted R^2^ = 0.38, *F*(3, 249,996) = 51,210, *p* \< 0.01). The intercept was 236,800, 95% CI [228,608.10, 244,953.50], SE = 4,170, *t*(249,996) = 56.78, *p* \< 0.01. The residual standard error was 587,400. This inidcates that the predictors significantly contributed towards explaining Sales, however, diagnostic findings relieved that violations of homoscedasticity and independence of residuals (autocorrelation), suggesting that the model's robustness and reliability of standard errors may be limited. The results are presented in the table below.

|  Predictor  | $\beta$ |          95% CI          |  SE   | *t*(249,996) |  *p*   |
|:-----------:|:-------:|:------------------------:|:-----:|:------------:|:------:|
| (Intercept) | 236,800 | [228,608.10, 244,953.50] | 4,170 |    56.78     | \< .01 |
|   YouTube   |  8.08   |       [8.02, 8.13]       | 0.02  |    277.54    | \< .01 |
|   TikTok    |  7.30   |       [7.22, 7.39]       | 0.04  |    165.63    | \< .01 |
|  Facebook   |  7.84   |       [7.77, 7.91]       | 0.35  |    222.12    | \< .01 |

: Regression Coefficients Predicting Sales from Multiple Advertising Channels

***Note.*** *N* = 250,000; *SE* = standard error; *CI* = confidence interval.

Although all advertising channels significantly predicted Sales, YouTube expenditure had a slightly stronger effect ($\beta$ = 8.08) compared to Facebook ($\beta$ = 7.84) and TikTok ($\beta$ = 7.30). This suggests that, among the three platforms, YouTube contributes marginally more to Sales when holding other expenditures constant.

## Business Analysis (Boardroom-Ready Language)

For every one Kenyan Shilling (KES) increase in YouTube advertising expenditure, Sales are predicted to increase by approximately 8.08 shillings, holding TikTok and Facebook expenditure constant. This is slightly higher than the corresponding effects for Facebook (7.84) and TikTok (7.30).

Recommendation:

-   Management should continue to treat YouTube, TikTok, and Facebook as complementary pillars of a unified digital marketing strategy. However, a modest increase in YouTube advertising expenditure is advisable, given its slightly stronger association with Sales performance. Continuous monitoring and ROI analysis should be maintained to ensure sustained effectiveness across platforms.

# References and Further Reading

American Psychological Association. (2025, February). *Journal Article Reporting Standards (JARS)*. APA Style. Retrieved April 28, 2025, from <https://apastyle.apa.org/jars>

Hodeghatta, U. R., & Nayak, U. (2023). *Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach* (2nd ed.). Apress. <https://link.springer.com/book/10.1007/978-1-4842-8754-5>