Posts

Showing posts from April, 2022

Classification Metrics

  Classification Metrics - accuracy -precision - recall - f1 score - TPR/sensitivity/recall - FPR How confident model is in clearly separating the positive and negative - ROC - least affected by change in threshold  - AUC - x=y  line - area under curve 

Regression Metrics

 Regression Metrics

Need of feature scaling

  Scaling features helps optimization algorithms to reach the minima of cost function quickly. Scaling features restrict models from being biased towards features having higher/lower magnitude values. Normalization and Standardization are two scaling techniques. With gaussian( normal) distributed data samples, Standardization works perfectly.

cost and loss functions

cost and loss functions are synonymous (some people also call it error function). The more general scenario is to define an objective function first, which you want to optimize. This objective function could be to - maximize the posterior probabilities (e.g., naive Bayes) - maximize a fitness function (genetic programming) - maximize the total reward/value function (reinforcement learning) - maximize information gain/minimize child node impurities (CART decision tree classification) - minimize a mean squared error cost (or loss) function (CART, decision tree regression, linear regression, adaptive linear neurons, ... - maximize log-likelihood or minimize cross-entropy loss (or cost) function - minimize hinge loss (support vector machine) A loss function  is a part of  a cost function  which is a type of  an objective function. The purpose of Cost Function is to be either: Minimized  — then returned value is usually called cost, loss or error. The goal is to find...

Use random Seeds Efficiently

 use random seeds efficiently  - reproducible results  - how much outcome like accuracy varies by using different random seeds 

Gini vs Entropy

 Gini vs Entropy   - Entropy uses log function  - Hence more training time , also computationally heavy  - Results are identical mostly  - Four types of features - informative , redundant , repititive, random  - Fsocre to measure of a model's accuracy on a dataset - F1 score - harmonic mean of precision and recall - A model will be sensible when both our Precision and Recall is high. To capture this important property, F1-Score is calculated

CommonWealth bank of Australia - Interview

 - dealing with latitude and longitude data without APi or library  - classification problem is linear or non linear   - Present the projects with all the approaches and challenges and keywords