DataGeeko.com

Posts

New home for Datageeko.com; Bye Wordpress!
May 26, 2021 others
Fun weekend project for migrating Wordpress to Jekyll.
The Search for Universal Correlation
Apr 26, 2021 data-science
There are various way to measure association between variables. Is there an universal one to rule them all?
How to Perform EDA Efficiently
Apr 26, 2021 data-science
You have been tasked to crunch a dataset and extract insights in less than 24 hours. Before we put our heads down and grind harder, is there a more efficient strategy to go about it?
Statistical Bias and Paradoxes that creep up in your data analysis
Mar 12, 2021 data-science random-questions statistics
Statistical bias could creep up on our analysis and caused us to communicate the wrong insights and drive home the wrong conclusions.
Practical A/B Testing
Feb 11, 2021 data-science statistics
A Jupyter notebook is embedded within this post. Visit the notebook here if it cannot render properly in your browser.
Test your knowledge - Tricky Probability questions with answers
Jan 22, 2021 statistics test-your-knowledge
This post is displayed directly from my notebook @ Github. You might want to view it on Github directly if it doesn’t render properly on your browser.
Let's say we have 1 million app rider journey trips. We want to build a model to predict ETA after a rider makes a ride request...
Dec 29, 2020 data-science machine-learning test-your-knowledge
..how would we know if we have enough data to create an accurate enough model?
Let's say you have a categorical variable with thousands of distinct values, how would you encode it?
Dec 28, 2020 machine-learning test-your-knowledge
One-hot encoding is out of the question since a large number of distinct values will result in large dimensionality problems(Curse of Dimensionality) in modeling stage.
Let's say we want to build a model to predict booking prices for a hotel booking company. Between linear regression and random forest regression, which model would perform better and why?
Dec 28, 2020 machine-learning statistics test-your-knowledge
Before we quickly answer “Random Forest”, let’s take a step back and put on our structured thinking cap to ask ourselves why and perhaps in real life, companies might take the other choice.
Illustrated guide to Hypothesis testing using Python
Dec 18, 2020 data-science python statistics
This is a hands-on guide to hypothesis testing, where we use both “hand coded” and the common statistical libraries, to calculate different statistical test.