Tai's Blog

Bias Variance Decomposition

引言我们规定，训练集记为$\mathcal{D}$，我们从中取一个样本$\boldsymbol{x}$，其训练集标签为$y_{\mathca...

Noise Contrastive Estimation

难以承受之重文本生成是 NLP 任务中比较典型的一类，记参数为$\boldsymbol{\theta }$，给定的 context 为$\boldsymbol{c}$...

Fast Greedy MAP Inference for DPP

问题先规定一些术语：记选中元素构成的集合为$\mathcal{S}$，未选中构成的元素记为$\mathcal{R}$，$\mathbf{L}...

Determinantal Point Process

在机器学习中，我们通常会面临一个问题：给定一个集合$\mathbf{S}$，从中寻找$k$个样本构成子集$\mathbf{V}$，尽量使得子...

Generalized Linear Models

定义若一个分布能够以下述方式进行表示，则称之为指数族（ Exponential Family）的一员 $$ \begin{equation} p(y; \eta ) = b(y)\exp(\eta^{\mathbf{T}}T(y) - a(\eta )) \end{equation} $$ 其中$\eta$被称为分布的自然参数（n...

Diving in distributed training in PyTorch

鉴于网上此类教程有不少模糊不清，对原理不得其法，代码也难跑通，故而花了几天细究了一下相关原理和实现，欢迎批评指正！代码开源在此： DL-Tools Cache effective tools for deep...

Going Deeper into Back-Propagation

1. Gradient descent optimization Gradient-based methods make use of the gradient information to adjust the parameters. Among them, gradient descent can be the simplest. Gradient descent makes the parameters to walk a small step in the direction of the negative gradient. $$ \boldsymbol{w}^{\tau + 1} = \boldsymbol{w}^{\tau} - \eta \nabla_{\boldsymbol{w}^{\tau}} E \tag{1.1} $$ where $\eta, \tau, E$ label learning rate ($\eta > 0$), the iteration step and the loss function. Wait!...

Tips for Training Neural Networks

Recently, I have read a blog about training neural networks (simplified as NN in the rest part of this post) and it is really amazing. I am going to add my own experience in this post along with summarizing that blog’s interesting part. Nowadays, it seems like that training NN is extremely easy for there are plenty of free frameworks which are simple to use (e.g. PyTorch, Numpy, Tensorflow). Well,...