PRACTICAL AI Research Blog
tags: [transformers, image classification, self-attention, cross-attention]
Transformers have been shown to be a great neural network architecture and works well in various domains, particularly proving itself in the field of NLP and Vision. This is an encoder-decoder architecture, with encoder block generating keys, and values for the decoder block through multi-headed self-attention, and the decoder block generating queries through a masked multi-headed self-attention block. The key-value pair from encoder and query from decoder is then passed through another multi-headed self-attention (effectively cross-attention) to generate output probability vectors.
tags: [machine learning, polynomial regression, non-linear regression, back-propagation]
This blog post provides a basic understanding of neural network components and then uses these concepts for building and training a neural network from scratch for real problem using NumPy.
Traditionally, Linear least-squares are used for polynomial function co-efficient estimation, however, one needs to make explicit assumption about the model and polynomial order. Neural Networks on the other hand makes no assumption about the function that generated the data, it rather simply tries to approximate the model from available noisy dataset.
In this blog, we consider a toy problem of Polynomial Function Regression, where we define an n-th order polynomial function, and then generate noisy data points from this function. The goal of the neural network is then to learn the coefficients of the polynomial function that best fit the data through a simple one layer neural network.
word2vec is a family of algorithms introduced about a decade ago by Mikolov et al.  at Google, and describes a way of learning word embeddings from large datasets in an unsupervised way. These have been tremendously popular recently and has paved a path for machines to understand contexts in natural language. These embeddings are essentially a lower-dimensional vector representation of each word in a given vocabulary within certain context and helps us capture its semantic and syntactic meaning.
This is a two-part blog post where we will first build an intuition of how to formulate such word embeddings, and then move on to explore learning these embeddings in a self-supervised way using machine learning (ML). We will build the complete word2vec pipeline without using any ML framework for a better understanding of all underlying mechanics. Towards this goal, we will also derive various equations to be used within Stochastic Gradient Descent optimizer implementation.