Data Cleaning 101 in SQL — # 3.1 A discussion on the Nature of Outliers

I am glad to have you for the third part of my complete guide on data cleaning. #1: Tidying Messy Data #2: Dealing with Missing Data #3.1: A discussion on the Nature of Outliers <– You are here #3.2: The Origin of Outliers & Detection Techniques #4.1: Where does Data Duplication come from? #4.2: A Practical Tutorial for Data Deduplication I like weird people. The black sheep, the odd ducks, the rejects, the eccentric, the loners, the lost and forgotten. More often than not, these people have the most beautiful souls. — Unknown ...

September 25, 2025 · 8 min · Brian Tran

FIN_EF - TT01 - Statistics Refresher n Linear Regression Exercises

Part I (Probability) Q1. Random Variables a. Gender: random, no underlying covariate/biased introduced; b. No. crash: random, depending on unpredictable factors: hardware failures, software malfunction, etc.; c. Commute time: random, influenced by unpredictable variables: transport delays, traffic jams, accidents; d. Computer assignment: random, no obvious assignment process known; e. Rainfall: random due to randomness of atmostpheric system. Temperature, wind speed, humidity, and other factors may vary randomly. Q4. Sample Mean There is a thin chance that the sample average weight of four students is exactly equal to the class (population) mean. However, the sample mean is generally close to but not equal to the population average due to sample error. ...

September 25, 2025 · 3 min · Brian Tran

FIN_EF - TT03 - Panel Data Model n Logit Regression Exercises

Logit Model Exercises Question 1 a. Describe ROC Curve ROC curves usually look like a bow reaching out to the top left corner of the chart. The more the curve bow out from the diagonal, the better the credit scoring model. b. Estimate of the Probability of External Financing 1 2 3 mdl2_zi <- def,ass,iag,pfin)0.72+0.02def+0.0003ass0.002iag+0.79pfinmdl2prob<\(z)exp(z)/(1+exp(z))probfirm1<mdl2prob(mdl2zi(1.10,1.00,0.00,0.00))TheprobabilityofexternalfinancingofFirm1usingModel2is33.23.c.MarginalEffectExplanation123mdl2mepf<\(z)(exp(z)/(1+exp(z))2)0.79mef1<mdl2mepf(mdl2zi(1.10,1.00,0.00,0.00))mef2<mdl2mepf(mdl2zi(0.13,1.00,0.00,0.50))Themarginaleffectgenerallyvariesacrossobservationsbecauseitdependsnotonlyontheestimatedcoefficientsbutalsotheevaluationpointthrough\(zi, i.e. f(zi). For instance, the marginal effects shrink at extreme probabilities as f(zi) becomes smaller. ...

September 25, 2025 · 4 min · Brian Tran

FIN_EF - TT04 - Panel Data, Endogeneity, and Diff-in-Diff Exercises

Panel Data, Endogeneity, and Diff-in-Diff Exercises Question 4 a. Endogeneity Issue The regression model omits a dummy variable of Industry (1=Creative, 0=Non-creative), i.e. ϵit=cDiCRE+uit, which may create endogeneity problem if Cov(Fit,DiCRE)0. b. Sign of Biased Coefficient Assume that Fit<0.5 on average, then Cov(Fit,DiCRE)>0 and Cov(Rit,DiCRE)>0, leading to a positively biased estimate of b, i.e. b^>b. ...

September 25, 2025 · 3 min · Brian Tran

S417 Financial Econometrics - Coca Cola Stock Volatility Analysis

Disclaimer: In this assignment, I have utilised Claude 4 Sonet in various aspects, including clarifying the expectations of the questions, facilitating my understanding of the addressed concepts, roxygen2-style document generation for helper functions, code debugging, and proof-reading. Exercise 1 - Financial Data a. Stylized Facts Analysis i. Data Crawling & Prepocessing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 if (!file.exists("Data/price_dt.rds")) { # Download Data stock_ls <- c("COST", "WMT", "KO", "PEP") price_dt <- tq_get(stock_ls, get = "stock.prices", from = "2000-01-01", # to = as.character(Sys.Date() - 1) to = "2025-08-09" ) # Save model data saveRDS(price_dt, file = "Data/price_dt.rds") } else { # Access saved stock data price_dt <- readRDS("Data/price_dt.rds") } # Prepare return data return_all_dt <- prep_return_dt(price_dt) # Extract different frequencies daily_returns <- return_all_dtdailyweeklyreturns<returnalldtweekly monthly_returns <- return_all_dt$monthly head(daily_returns) 1 2 3 4 5 6 7 8 9 ## # A tibble: 6 × 9 ## symbol date adjusted ret grossret logret sqret absret volume ## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 COST 2000-01-04 28.1 -0.0548 0.945 -0.0563 0.00317 0.0563 5722800 ## 2 COST 2000-01-05 28.6 0.0171 1.02 0.0169 0.000287 0.0169 7726400 ## 3 COST 2000-01-06 29.2 0.0201 1.02 0.0199 0.000396 0.0199 7221400 ## 4 COST 2000-01-07 31.1 0.0662 1.07 0.0641 0.00411 0.0641 5164800 ## 5 COST 2000-01-10 31.8 0.0208 1.02 0.0206 0.000425 0.0206 4454000 ## 6 COST 2000-01-11 30.6 -0.0355 0.964 -0.0362 0.00131 0.0362 2955000 1 2 3 4 # Calculate 5% quantile daily_q5_dt <- quantile(daily_returns |> pull(ret), probs = 0.05) weekly_q5_dt <- quantile(weekly_returns |> pull(ret), probs = 0.05) monthly_q5_dt <- quantile(monthly_returns |> pull(ret), probs = 0.05) Compute summary statistics ...

September 25, 2025 · 57 min · Brian Tran

Major Issues in Causal AI dismantled by a Quantum Physicist at Spotify

The Uprising of A Miraculous Magic - Causal AI Throughout the history, humanity has been adapting and evolving relentlessly from an early primate to the dominant, powerful, and transformative force of the earth. In that journey, the Homo sapiens had been through four different industrial revolutions (IRs), driven by constant and incessant curiosity about the world. And recently, the transformative power of Large Language Models (LLMs) marks what many consider the Fifth Industrial Revolution – a subtle initiative fundamentally reshaping economic dynamics and individual livelihoods worldwide. Within this transformation, Causal Artificial Intelligence (AI) emerges as a breakthrough in the field of Machine Learning (ML), addressing what Pearl(1) criticized: “Machines’ lack of understanding of causal relations is perhaps the biggest roadblock to giving them human-level intelligence.” Overlooking such a revolutionized knowledge would be undoubtedly oversight. ...

June 22, 2025 · 5 min · Brian Tran

7 Deadly Fears I Have Faced - Part 3 - A Data Analyst Journal

Hello everyone! I hope you all had an enjoyable Lunar New Year holiday. I hope this year brings you joy, prosperity, and good fortune. It has been too long since I have written here but I finally found some time to post a new blog update. Restrain yourself from having an Outburst, photo by Ahmad Dirini on Unsplash I am glad to have you for the second part of my career journal. ...

February 12, 2024 · 9 min · Brian Tran

7 Deadly Fears I Have Faced — Part 2 — A Data Analyst Journal

Leaving your fears behind, photo on Isha Foundation I am glad to have you for the second part of my career journal. #1: 7 Deadly Fears I Have Faced - Part 1 #2: 7 Deadly Fears I Have Faced - Part 2 <– You are here #3: 7 Deadly Fears I Have Faced - Part 3 They Don’t Know Sh*t About My Great Artwork - The Pride The Fear Of Becoming Arrogant Rather than being tormented by the Fear of Not Being Valued in my early career, I gradually learned the art of tempering the discomfort upon receiving criticism and grew indifferent to others’ opinions. Instead, I shifted my focus towards bridging the gap in knowledge and skills, and extracting lessons from each completed task. I bet that some colleagues might perceive my attitude as an expression of arrogance; however, the truth is quite the opposite. ...

October 21, 2023 · 5 min · Brian Tran

Data Cleaning 101 in SQL - #4.2 A Practical Tutorial for Data Deduplication

I am glad to have you for the fourth part of my complete guide on data cleaning. #1: Tidying Messy Data #2: Dealing with Missing Data #3.1: A discussion on the Nature of Outliers #3.2: The Origin of Outliers & Detection Techniques #4.1: Where does Data Duplication come from? #4.2: A Practical Tutorial for Data Deduplication <– You are here Can you spot the impostor? - Picture by the author ...

September 16, 2023 · 5 min · Brian Tran

7 Deadly Fears I Have Faced - Part 1 - A Data Analyst Journal

Photo by Patty Vanitas on flickr There are six things the Lord hates, seven that are detestable to him: haughty eyes, a lying tongue, hands that shed innocent blood, a heart that devises wicked schemes, feet that are quick to rush into evil, a false witness who pours out lies and a person who stirs up conflict in the community. — Proverbs 6:16-19 I am glad to have you for the first part of my career journal. ...

September 9, 2023 · 12 min · Brian Tran