Posts

Showing posts from May, 2022

TRANSFER learning vs fine tuning

  Transfer learning is about “transferring” the learnt representations to another problem. For example one can use features from a pre-trained convolutional neural network (convNet) to power a linear support vector machine (SVM). In such a case the pre-trained model can be held fixed while the linear SVM weights can be updated. Fine tuning on the other hand is just about making some fine adjustments to further improve performance. For example, during transfer learning, you can unfreeze the pre-trained model and let it adapt more to the task at hand. Thus Transfer learning Is about projecting all new inputs through a pre-trained model. Like if we have a pre-trained model function  http://www.w3.org/1998/Math/MathML "><mi>f</mi><mo stretchy="false">(</mo><mo stretchy="false">)</mo></math>"> f ( ) f()  and wish to learn a new function  http://www.w3.org/1998/Math/MathML "><mi>g</mi><mo str...

What is the difference between Sentence Encodings and Contextualized Word Embeddings?

  A  contextualized word embeding  is a vector representing a word in a special context. The  traditional word embeddings  such as Word2Vec and GloVe generate one vector for each word, whereas a contextualized word embedding generates a vector for a word depending on the context. Consider the sentences  The duck is swimming and  You shall duck when someone shoots at you . With traditional word embeddings, the word vector for  duck would be the same in both sentences, whereas it should be a different one in the contextualized case. While word embeddings encode words into a vector representation, there is also the question on how to represent a whole sentence in a way a computer can easily work with. These  sentence encodings  can embedd a whole sentence as one vector , doc2vec for example which generate a vector for a sentence. But also BERT generates a representation for the whole sentence, the [CLS]-token. So in short, a conextualized w...

Depth Data Science Interview

    Here are 3 practical ML tips you don't read about in textbooks. I learned these while building ML solutions at ๐—ฃ๐—ฎ๐˜†๐—ฃ๐—ฎ๐—น and ๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ. Want to level up your ML skills? Read on๐Ÿ‘‡ (๐Ÿญ) ๐—ฉ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ ๐—œ๐—บ๐—ฝ๐—ผ๐—ฟ๐˜๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ผ๐—ป ๐—–๐—ผ๐—น๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ ๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€ ❗Don't trust variable importance from random forest blindly. The variable importance of a feature is increased whenever the model splits on the node. When two features are collinear, the variable importance of the features becomes diluted. ⭐ The better approach is to remove collinearity with variable selection using Pearson/Spearman correlation, VIF, or Lasso regression. Then, you can use the random forest or any other tree-based models to get the final model and interpret the variable importance of the features. (๐Ÿฎ) ๐—ฅ๐—ฎ๐—ป๐—ฑ๐—ผ๐—บ ๐—™๐—ผ๐—ฟ๐—ฒ๐˜€๐˜ (๐—ฅ๐—™) ๐—ผ๐—ป ๐—–๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐—ง๐—ฎ๐—ฟ๐—ด๐—ฒ๐˜ ๐—ฉ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ ❗If you are using RF or other tree-based models (e.g. XGboost), be aware that your target prediction...