Bias Variance Decomposition

引言 我们规定,训练集记为$\mathcal{D}$,我们从中取一个样本$\boldsymbol{x}$,其训练集标签为$y_{\mathca...

June 21, 2023 · 991 words

Noise Contrastive Estimation

难以承受之重 文本生成是 NLP 任务中比较典型的一类,记参数为$\boldsymbol{\theta }$,给定的 context 为$\boldsymbol{c}$...

May 29, 2023 · 4178 words

Fast Greedy MAP Inference for DPP

问题 先规定一些术语:记选中元素构成的集合为$\mathcal{S}$,未选中构成的元素记为$\mathcal{R}$,$\mathbf{L}...

May 16, 2023 · 4188 words

Determinantal Point Process

在机器学习中,我们通常会面临一个问题:给定一个集合$\mathbf{S}$,从中寻找$k$个样本构成子集$\mathbf{V}$,尽量使得子...

April 21, 2023 · 2889 words

Generalized Linear Models

定义 若一个分布能够以下述方式进行表示,则称之为指数族( Exponential Family)的一员 $$ \begin{equation} p(y; \eta ) = b(y)\exp(\eta^{\mathbf{T}}T(y) - a(\eta )) \end{equation} $$ 其中$\eta$被称为分布的自然参数(n...

February 17, 2023 · 3664 words

Diving in distributed training in PyTorch

鉴于网上此类教程有不少模糊不清,对原理不得其法,代码也难跑通,故而花了几天细究了一下相关原理和实现,欢迎批评指正!代码开源在此: DL-Tools Cache effective tools for deep...

November 20, 2022 · 5172 words

Going Deeper into Back-Propagation

1. Gradient descent optimization Gradient-based methods make use of the gradient information to adjust the parameters. Among them, gradient descent can be the simplest. Gradient descent makes the parameters to walk a small step in the direction of the negative gradient. $$ \boldsymbol{w}^{\tau + 1} = \boldsymbol{w}^{\tau} - \eta \nabla_{\boldsymbol{w}^{\tau}} E \tag{1.1} $$ where $\eta, \tau, E$ label learning rate ($\eta > 0$), the iteration step and the loss function. Wait!...

September 7, 2022 · 1051 words

Tips for Training Neural Networks

Recently, I have read a blog about training neural networks (simplified as NN in the rest part of this post) and it is really amazing. I am going to add my own experience in this post along with summarizing that blog’s interesting part. Nowadays, it seems like that training NN is extremely easy for there are plenty of free frameworks which are simple to use (e.g. PyTorch, Numpy, Tensorflow). Well,...

July 30, 2022 · 1798 words