Posts
New home for Datageeko.com; Bye Wordpress!
othersFun weekend project for migrating Wordpress to Jekyll.The Search for Universal Correlation
data-scienceThere are various way to measure association between variables. Is there an universal one to rule them all?
How to Perform EDA Efficiently
data-scienceYou have been tasked to crunch a dataset and extract insights in less than 24 hours. Before we put our heads down and grind harder, is there a more efficient strategy to go about it?
Statistical Bias and Paradoxes that creep up in your data analysis
data-science random-questions statisticsStatistical bias could creep up on our analysis and caused us to communicate the wrong insights and drive home the wrong conclusions.
Practical A/B Testing
data-science statisticsA Jupyter notebook is embedded within this post. Visit the notebook here if it cannot render properly in your browser.
Test your knowledge - Tricky Probability questions with answers
statistics test-your-knowledgeThis post is displayed directly from my notebook @ Github. You might want to view it on Github directly if it doesn’t render properly on your browser.
Let's say we have 1 million app rider journey trips. We want to build a model to predict ETA after a rider makes a ride request...
data-science machine-learning test-your-knowledge..how would we know if we have enough data to create an accurate enough model?
Let's say you have a categorical variable with thousands of distinct values, how would you encode it?
machine-learning test-your-knowledgeOne-hot encoding is out of the question since a large number of distinct values will result in large dimensionality problems(Curse of Dimensionality) in modeling stage.
Let's say we want to build a model to predict booking prices for a hotel booking company. Between linear regression and random forest regression, which model would perform better and why?
machine-learning statistics test-your-knowledgeBefore we quickly answer “Random Forest”, let’s take a step back and put on our structured thinking cap to ask ourselves why and perhaps in real life, companies might take the other choice.
Illustrated guide to Hypothesis testing using Python
data-science python statisticsThis is a hands-on guide to hypothesis testing, where we use both “hand coded” and the common statistical libraries, to calculate different statistical test.