EF Part 1 - Can We Predict Stock Returns? Testing the Fama-French Model

Disclaimer: This series is converted from my project in Empirical Finance at VU Amsterdam, co-authoring with Denzel van Beek and Marcin Grobelny. The data used in this post is simulated data. This series has four parts: Part 1: Can We Predict Stock Returns? Testing the Fama-French Model <– You’re here Part 2: Did That Tax Law Actually Work? A Real-World Policy Analysis Part 3: Predicting Up or Down: When Will Stocks Rise Tomorrow? Part 4: Forecasting Trading Volume: When Simple Beats Complex Introduction: The Quest to Explain Stock Returns As an enthusiastic investor or an ambitious trader, you’ve probably asked yourself this question at least once in your life: “Can I predict tomorrow’s stock price?” ...

December 18, 2025 · 14 min · Brian Tran

FIN_EF - TT01 - Statistics Refresher n Linear Regression Exercises

Part I (Probability) Q1. Random Variables a. Gender: random, no underlying covariate/biased introduced; b. No. crash: random, depending on unpredictable factors: hardware failures, software malfunction, etc.; c. Commute time: random, influenced by unpredictable variables: transport delays, traffic jams, accidents; d. Computer assignment: random, no obvious assignment process known; e. Rainfall: random due to randomness of atmostpheric system. Temperature, wind speed, humidity, and other factors may vary randomly. Q4. Sample Mean There is a thin chance that the sample average weight of four students is exactly equal to the class (population) mean. However, the sample mean is generally close to but not equal to the population average due to sample error. ...

September 25, 2025 · 3 min · Brian Tran

FIN_EF - TT03 - Panel Data Model n Logit Regression Exercises

Logit Model Exercises Question 1 a. Describe ROC Curve ROC curves usually look like a bow reaching out to the top left corner of the chart. The more the curve bow out from the diagonal, the better the credit scoring model. b. Estimate of the Probability of External Financing 1 2 3 mdl2_zi <- \(def, ass, iag, p_fin){-0.72 + 0.02*def + 0.0003*ass - 0.002*iag + 0.79*p_fin} mdl2_prob <- \(z){exp(z)/(1+exp(z))} prob_firm1 <- mdl2_prob(mdl2_zi(1.10, 1.00, 0.00, 0.00)) The probability of external financing of Firm 1 using Model 2 is 33.23. c. Marginal Effect Explanation 1 2 3 mdl2_me_pf <- \(z){(exp(z)/(1+exp(z))^2)*0.79} me_f1 <- mdl2_me_pf(mdl2_zi(1.10, 1.00, 0.00, 0.00)) me_f2 <- mdl2_me_pf(mdl2_zi(0.13, 1.00, 0.00, 0.50)) The marginal effect generally varies across observations because it depends not only on the estimated coefficients but also the evaluation point through \(z_i\), i.e. \(f(z_i)\). For instance, the marginal effects shrink at extreme probabilities as \(f(z_i)\) becomes smaller. ...

September 25, 2025 · 4 min · Brian Tran

FIN_EF - TT04 - Panel Data, Endogeneity, and Diff-in-Diff Exercises

Panel Data, Endogeneity, and Diff-in-Diff Exercises Question 4 a. Endogeneity Issue The regression model omits a dummy variable of Industry (1=Creative, 0=Non-creative), i.e. \(\epsilon_{it} = c D_i^{CRE} + u_{it}\), which may create endogeneity problem if \(Cov(F_{it}, D_i^{CRE}) \neq 0\). b. Sign of Biased Coefficient Assume that \(F_{it}<0.5\) on average, then \(Cov(F_{it}, D_i^{CRE}) > 0\) and \(Cov(R_{it}, D_i^{CRE}) > 0\), leading to a positively biased estimate of \(b\), i.e. \(\hat b > b\). ...

September 25, 2025 · 3 min · Brian Tran

S417 Financial Econometrics - Coca Cola Stock Volatility Analysis

Disclaimer: In this assignment, I have utilised Claude 4 Sonet in various aspects, including clarifying the expectations of the questions, facilitating my understanding of the addressed concepts, roxygen2-style document generation for helper functions, code debugging, and proof-reading. Exercise 1 - Financial Data a. Stylized Facts Analysis i. Data Crawling & Prepocessing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 if (!file.exists("Data/price_dt.rds")) { # Download Data stock_ls <- c("COST", "WMT", "KO", "PEP") price_dt <- tq_get(stock_ls, get = "stock.prices", from = "2000-01-01", # to = as.character(Sys.Date() - 1) to = "2025-08-09" ) # Save model data saveRDS(price_dt, file = "Data/price_dt.rds") } else { # Access saved stock data price_dt <- readRDS("Data/price_dt.rds") } # Prepare return data return_all_dt <- prep_return_dt(price_dt) # Extract different frequencies daily_returns <- return_all_dt$daily weekly_returns <- return_all_dt$weekly monthly_returns <- return_all_dt$monthly head(daily_returns) 1 2 3 4 5 6 7 8 9 ## # A tibble: 6 × 9 ## symbol date adjusted ret grossret logret sqret absret volume ## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 COST 2000-01-04 28.1 -0.0548 0.945 -0.0563 0.00317 0.0563 5722800 ## 2 COST 2000-01-05 28.6 0.0171 1.02 0.0169 0.000287 0.0169 7726400 ## 3 COST 2000-01-06 29.2 0.0201 1.02 0.0199 0.000396 0.0199 7221400 ## 4 COST 2000-01-07 31.1 0.0662 1.07 0.0641 0.00411 0.0641 5164800 ## 5 COST 2000-01-10 31.8 0.0208 1.02 0.0206 0.000425 0.0206 4454000 ## 6 COST 2000-01-11 30.6 -0.0355 0.964 -0.0362 0.00131 0.0362 2955000 1 2 3 4 # Calculate 5% quantile daily_q5_dt <- quantile(daily_returns |> pull(ret), probs = 0.05) weekly_q5_dt <- quantile(weekly_returns |> pull(ret), probs = 0.05) monthly_q5_dt <- quantile(monthly_returns |> pull(ret), probs = 0.05) Compute summary statistics ...

September 25, 2025 · 57 min · Brian Tran

User Segmentation Using K-means in R

These days, the need of understanding paying users among companies, especially those who operate online solutions, has been arising noticeably. As a result, the business intelligence unit should provide a accurate and insightful segmentation analysis to business units as a required course of action. In this article, I would try to run you through a step-by-step instruction on how conduct a segmentation analysis by using K-means clustering. Figure 1: Photo by John Lockwood on Unsplash ...

July 14, 2023 · 8 min · Brian Tran