# Hi there 👋

Hi, this is Yunpeng Tai. I blog to think about interesting things.

## Generalized Linear Models

February 17, 2023 · 3632 words

## 新的主题

June 19, 2022 · Updated: February 12, 2023 · 2228 words

## Diving in distributed training in PyTorch

November 20, 2022 · 5454 words

## Going Deeper into Back-propagation

1. Gradient descent optimization Gradient-based methods make use of the gradient information to adjust the parameters. Among them, gradient descent can be the simplest. Gradient descent makes the parameters to walk a small step in the direction of the negative gradient. $$\mathbf{w}^{\tau + 1} = \mathbf{w}^{\tau} - \eta \nabla_{\mathbf{w}^{\tau}} E \tag{1.1}$$ where $$\eta, \tau, E$$ label learning rate ($$\eta > 0$$), the iteration step and the loss function....

September 7, 2022 · 1054 words

## Tips for Training Neural Networks

Recently, I have read a blog about training neural networks (simplified as NN in the rest part of this post) and it is really amazing. I am going to add my own experience in this post along with summarizing that blog’s interesting part. Nowadays, it seems like that training NN is extremely easy for there are plenty of free frameworks which are simple to use (e.g. PyTorch, Numpy, Tensorflow). Well, training NN is easy when you are copying others’ work (e....

July 30, 2022 · 1793 words

## Quotes of Mathematicians

Life is complex, and it has both real and imaginary parts. — Someone Basically, I’m not interested in doing research and I never have been… I’m interested in understanding, which is quite a different thing. And often to understand something you have to work it out yourself because no one else has done it. — David Blackwell To not know maths is a severe limitation to understanding the world. — Richard Feynman...

July 23, 2022 · 636 words

## Retrieval-Enhanced Transformer

Problems To Solve To Scale Down the model size while maintaining the performances. To incorporate External Memory Retrieval in the Large Language Model Modeling. How? Data Construction Training & Evaluation set: $$\text{MassiveText}$$ for both training & retrieval data (contains 5 trillion tokens) SentencePiece with a vocabulary of $$128K$$ tokens During training, we retrieving $$600B$$ tokens from the training The evaluation contains $$1.75T$$ tokens Test set leakage: Due to the huge retrieving database, the test set may have appeared in the training set....

June 19, 2022 · 718 words