Depth Data Science Interview

May 19, 2022

Here are 3 practical ML tips you don't read about in textbooks.

I learned these while building ML solutions at 𝗣𝗮𝘆𝗣𝗮𝗹 and 𝗚𝗼𝗼𝗴𝗹𝗲.

Want to level up your ML skills? Read on👇

(𝟭) 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲 𝗜𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝗰𝗲 𝗼𝗻 𝗖𝗼𝗹𝗹𝗶𝗻𝗲𝗮𝗿 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀

❗Don't trust variable importance from random forest blindly. The variable importance of a feature is increased whenever the model splits on the node. When two features are collinear, the variable importance of the features becomes diluted.

⭐ The better approach is to remove collinearity with variable selection using Pearson/Spearman correlation, VIF, or Lasso regression. Then, you can use the random forest or any other tree-based models to get the final model and interpret the variable importance of the features.

(𝟮) 𝗥𝗮𝗻𝗱𝗼𝗺 𝗙𝗼𝗿𝗲𝘀𝘁 (𝗥𝗙) 𝗼𝗻 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗧𝗮𝗿𝗴𝗲𝘁 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲

❗If you are using RF or other tree-based models (e.g. XGboost), be aware that your target prediction will be clipped based on the y range that the model has seen in training.

For instance, suppose that the train_y range is (100, 1000), but the test_y range is (300, 1500). The model will never predict a value beyond 1,000!

⭐ If you suspect the y-range to be unbounded, consider choosing a linear model such as OLS, Lasso, or dense neural networks.

(𝟯) 𝗨𝘀𝗲 𝗦𝗶𝗺𝗽𝘀𝗼𝗻'𝘀 𝗣𝗮𝗿𝗮𝗱𝗼𝘅 𝘁𝗼 𝗶𝗺𝗽𝗿𝗼𝘃𝗲 𝘆𝗼𝘂𝗿 𝗺𝗼𝗱𝗲𝗹

❗If your model is underperforming the benchmark, don't just add more signals and/or parameters to search in hyperparameter tuning. Do EDA on the residuals of the model. For instance, the global accuracy of your model might be 0.85%, but when you segment it by cohorts (e.g. gender, age, product category), your model might perform better or worse based on cohorts.

⭐ For the segments that the model is underperforming, conduct EDA to see if there are additional signals you can add to improve it.

This is the depth of ML knowledge that you should consider in practice and for data science interviews.

Search This Blog

Decorators in Python

Depth Data Science Interview

Comments

Post a Comment

Popular posts from this blog

Read and Navigate XML - Beautiful Soup

difference-between-stream-processing-and-message-processing

WordNet in Python