## Determinantal Point Process

在机器学习中，我们通常会面临一个问题：给定一个集合$\mathbf{S}$，从中寻找$k$个样本构成子集$\mathbf{V}$，尽量使得子...

在机器学习中，我们通常会面临一个问题：给定一个集合$\mathbf{S}$，从中寻找$k$个样本构成子集$\mathbf{V}$，尽量使得子...

定义 若一个分布能够以下述方式进行表示，则称之为指数族（ Exponential Family）的一员 $$ p(y; \eta ) = b(y)\exp(\eta^{\mathbf{T}}T(y) - a(\eta )) $$ 其中$\eta$被称为分布的自然参数（nat...

这是基于 Hugo 系列主题第一篇文章，因为之前是在 Jekyll 上进行渲染，故而 Hello World也有更新 为啥变动 那么为啥从 Jekyll 变到 Hugo 呢？原因其实有几点： 之前用的主题看...

鉴于网上此类教程有不少模糊不清，对原理不得其法，代码也难跑通，故而花了几天细究了一下相关原理和实现，欢迎批评指正！ 关于此部分的代码，可以去这...

1. Gradient descent optimization Gradient-based methods make use of the gradient information to adjust the parameters. Among them, gradient descent can be the simplest. Gradient descent makes the parameters to walk a small step in the direction of the negative gradient. $$ \mathbf{w}^{\tau + 1} = \mathbf{w}^{\tau} - \eta \nabla_{\mathbf{w}^{\tau}} E \tag{1.1} $$ where \(\eta, \tau, E\) label learning rate (\(\eta > 0\)), the iteration step and the loss function....

Recently, I have read a blog about training neural networks (simplified as NN in the rest part of this post) and it is really amazing. I am going to add my own experience in this post along with summarizing that blog’s interesting part. Nowadays, it seems like that training NN is extremely easy for there are plenty of free frameworks which are simple to use (e.g. PyTorch, Numpy, Tensorflow). Well, training NN is easy when you are copying others’ work (e....

Life is complex, and it has both real and imaginary parts. — Someone Basically, I’m not interested in doing research and I never have been… I’m interested in understanding, which is quite a different thing. And often to understand something you have to work it out yourself because no one else has done it. — David Blackwell To not know maths is a severe limitation to understanding the world. — Richard Feynman...

Problems To Solve To Scale Down the model size while maintaining the performances. To incorporate External Memory Retrieval in the Large Language Model Modeling. How? Data Construction Training & Evaluation set: \(\text{MassiveText}\) for both training & retrieval data (contains 5 trillion tokens) SentencePiece with a vocabulary of \(128K\) tokens During training, we retrieving \(600B\) tokens from the training The evaluation contains \(1.75T\) tokens Test set leakage: Due to the huge retrieving database, the test set may have appeared in the training set....