[动手学深度学习]生成对抗网络GAN学习笔记

news/2023/12/9 17:13:16

论文原文:Generative Adversarial Nets (neurips.cc)

李沐GAN论文逐段精读:GAN论文逐段精读【论文精读】_哔哩哔哩_bilibili

论文代码:http://www.github.com/goodfeli/adversarial

Ian, J. et al. (2014) 'Generative adversarial network', NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, pp 2672–2680. doi: https://doi.org/10.48550/arXiv.1406.2661

目录

1. GAN论文原文学习

1.1. Abstract

1.2. Introduction

1.3. Related work

1.4. Adversarial nets

1.5. Theoretical Results

1.5.1. Global Optimality of p_g = p_data

1.5.2. Convergence of Algorithm 1

1.6. Experiments

1.7. Advantages and disadvantages

1.8. Conclusions and future work

2. 知识补充

2.1. Divergence

2.2. 唠嗑一下


1. GAN论文原文学习

1.1. Abstract

        ①They combined generative model G and discriminative model D together, which forms a new model. G is the "cheating" part which focus on imitating and D is the "distinguishing" part which focus on distinguishing where the data comes from.

        ②This model is rely on a "minmax" function

        ③GAN does not need Markov chains or unrolled approximate inference nets

        ④They designed qualitative and quantitative evaluation to analyse the feasibility of GAN

1.2. Introduction

        ①The authors praised deep learning and briefly mentioned its prospects

        ②Due to the difficulty of fitting or approximating the distribution of the ground truth, the designed a new generative model

        ③They compare the generated model to the person who makes counterfeit money, and the discriminative model to the police. Both parties will mutually promote and grow. The authors ultimately hope that the ability of the counterfeiter can be indistinguishable from the genuine product

        ④Both G and D are MLP, and G passes random noise

        ⑤They just adopt backpropagation and dropout in training

corpora  全集;corpus 的复数                

counterfeiter  n.伪造者;制假者;仿造者

1.3. Related work

        ①Recent works are concentrated on approximating function, such as succesful deep Boltzmann machine. However, their likelihood functions are too complex to process.

        ②Therefore, here comes generative model, which only generates samples but does not approximates function. Generative stochastic networks are an classic generative model.

        ③Their backpropagation:

\lim_{\sigma\to0}\nabla_{\boldsymbol{x}}\mathbb{E}_{\epsilon\sim\mathcal{N}(0,\sigma^2\boldsymbol{I})}f(\boldsymbol{x}+\epsilon)=\nabla_{\boldsymbol{x}}f(\boldsymbol{x})

        ④Variational autoencoders (VAEs) in Kingma and Welling and Rezende et al. do the similar work. However, VAEs are modeled by differentiate hidden units, which is contrary to GANs.

        ⑤And some others aims to approximate but are hard to. Such as Noise-contrastive estimation (NCE), discriminating data under noise, but limited in its discriminator.

        ⑥The most relevant work is predictability minimization, it utilize other hiden units to predict given unit. However, PM is different from GAN in that (a) PM focus on objective function minimizing, (b) PM is just a regularizer, (c) the two networks in PM respectively make output similar or different

        ⑦Adversarial examples distinguish which data is misclassified with no generative function

1.4. Adversarial nets

        ①They designed a minimax function:

\min_G\max_DV(D,G)\\\\=\mathbb{E}_{\boldsymbol{x}\sim p_{\mathrm{data}}(\boldsymbol{x})}[\log D(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{z}\sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log(1-D(G(\boldsymbol{z})))]

where p_{g} denotes generator's distribution,

\boldsymbol{x} represents data,

p_{z}(z) denotes prior probability with noise,

G\left ( z;\theta _{g} \right ) denotes a differentiable function, namely a MLP layer, with parameters \theta _{g} ,

D\left ( \boldsymbol{x};\theta _{d} \right ) also denotes a MLP layer with its output is a scalar, where the scalar is the probability that \boldsymbol{x} is real data exceeds the probability that it is generated data

        ②They train G and D together with maximizing D and minimizing G

        ③They reckon D is more likely to be overfitting. Hence, k-steps of optimizing D and 1-step optimizing of G is more suitable

        ④G is relatively weak in early stages, thus train G first might achieve better results

pedagogical  adj.教育学的;教学论的

1.5. Theoretical Results

        ①Their fitting diagram:

where D is blue and dashed line, G is green and solid line, the real data distribution is black and dashed line,

D converges to \frac{p_\mathrm{data}(\boldsymbol{x})}{p_\mathrm{data}(\boldsymbol{x})+p_g(\boldsymbol{x})}. And when it equals to \frac{1}{2} with p_{data}=p_{g} that means D can not discriminate any data

        ②Pseudocode of GAN:

1.5.1. Global Optimality of p_g = p_data

        ①They need to maximize V for D:

\begin{gathered} V(G,D) =\int_{\boldsymbol{x}}p_{\mathrm{data}}(\boldsymbol{x})\log(D(\boldsymbol{x}))dx+\int_{\boldsymbol{z}}p_{\boldsymbol{z}}(\boldsymbol{z})\log(1-D(g(\boldsymbol{z})))dz \\ =\int_{\boldsymbol{x}}p_{\mathrm{data}}(\boldsymbol{x})\log(D(\boldsymbol{x}))+p_{g}(\boldsymbol{x})\log(1-D(\boldsymbol{x}))dx \end{gathered}

for any coefficient a,b , the value of expression a\, log\left ( x \right )+b\, log\left ( 1-x \right ) achieves its maximum when x=\frac{a}{a+b} . Thus D=\frac{p_\mathrm{data}(\boldsymbol{x})}{p_\mathrm{data}(\boldsymbol{x})+p_g(\boldsymbol{x})}

        ②Then change the original function to:

\begin{aligned} C(G)& =\operatorname*{max}_{D}V(G,D) \\ &=\mathbb{E}_{\boldsymbol{x}\sim p_{\mathrm{data}}}[\operatorname{log}D_{G}^{*}(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{z}\sim p_{\boldsymbol{z}}}[\operatorname{log}(1-D_{G}^{*}(G(\boldsymbol{z})))] \\ &=\mathbb{E}_{\boldsymbol{x}\sim p_{\mathrm{data}}}[\operatorname{log}D_{G}^{*}(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{x}\sim p_{g}}[\operatorname{log}(1-D_{G}^{*}(\boldsymbol{x}))] \\ &=\mathbb{E}_{\boldsymbol{x}\sim p_\mathrm{data}}\left[\log\frac{p_\mathrm{data}(\boldsymbol{x})}{P_\mathrm{data}(\boldsymbol{x})+p_g(\boldsymbol{x})}\right]+\mathbb{E}_{\boldsymbol{x}\sim p_g}\left[\log\frac{p_g(\boldsymbol{x})}{p_\mathrm{data}(\boldsymbol{x})+p_g(\boldsymbol{x})}\right] \end{aligned}

        ③-log\, 4 is the minimum of C\left ( G \right ) when D_G^*(\boldsymbol{x})=\frac12

        ④KL divergence used for it:

C(G)=-\log(4)+KL\left(p_\text{data}\left\Vert\frac{p_\text{data}+p_g}2\right.\right)+KL\left(p_g\left\Vert\frac{p_\text{data}+p_g}2\right.\right)

        ⑤JS divergence used for it:

C(G)=-\log(4)+2\cdot JSD\left(p_\text{data}\left\|p_g\right.\right)

and the authors recognized the non negative nature of JS divergence more, therefore adopting JS divergence

1.5.2. Convergence of Algorithm 1

        ①The function is convex so that when gradient updating tends to stabilize, it may achieve the global optima

        ②Parzen window-based log-likelihood estimates:

\begin{array}{c|c|c}\text{Model}&\text{MNIST}&\text{TFD}\\\hline\text{DBN [3]}&138\pm2&1909\pm66\\\text{Stacked CAE [3]}&121\pm1.6&\mathbf{2110\pm50}\\\text{Deep GSN [5]}&214\pm1.1&1890\pm29\\\text{Adversarial nets}&\mathbf{225\pm2}&\mathbf{2057\pm26}\end{array}

where they adopt mean loglikelihood of samples on MNIST, standard error across folds of TFD

supremum  n.上确界;最小上界;上限

1.6. Experiments

        ①Datasets: MNIST, Toronto Face Database (TFD), CIFAR-10

        ②Activation: combination of ReLU and Sigmoid for generator, Maxout for discriminator

        ③Adopting dropout in discriminator

        ④Noise is only allowed as the bottommost input

        ⑤Their Gaussian Parzen window method brings high variance and performs somewhat poor in high dimensional spaces

1.7. Advantages and disadvantages

(1)Disadvantages

        ①There is no clear representation of p_{g}\left ( \boldsymbol{x} \right )

        ②It is difficult to achieve synchronous updates between D and G

(2)Advantages

        ①No need for Markov chain

        ②Updating by gradients instead of data

        ③They can express any distribution

1.8. Conclusions and future work

        ①Samples (left) and generative data (right with yellow outlines) in (a) MNIST, (b) TFD, (c) CIFAR-10 (fully connected model), (d) CIFAR-10 (convolutional discriminator and “deconvolutional” generator):

        ②"Digits obtained by linearly interpolating between coordinates in z space of the full model":

        ③Their summary of challenges in different parts:

interpolate  v.〈数〉插(值),内插,内推;计算(中间值);插入(字句等);添加(评论或字句);篡改;插话,插嘴

2. 知识补充

2.1. Divergence

(1)Kullback–Leibler divergence (KL divergence)

        ①相关链接:机器学习:Kullback-Leibler Divergence (KL 散度)_kullback-leibler散度-CSDN博客

        ②关于KL散度(Kullback-Leibler Divergence)的笔记 - 知乎 (zhihu.com)

(2)Jensen–Shannon divergence (JS divergence)

        ①理解JS散度(Jensen–Shannon divergence)-CSDN博客

2.2. 唠嗑一下

(1)论文倒是精简易懂,特别是配上李沐的讲解之后更没啥大问题了。但是作者提供的源码还是有点过于爆炸,且README没有多说。很难上手啊,新人完全不推荐


http://www.ppmy.cn/news/1162849.html

相关文章

HarmonyOS/OpenHarmony原生应用-ArkTS万能卡片组件Slider

滑动条组件,通常用于快速调节设置值,如音量调节、亮度调节等应用场景。该组件从API Version 7开始支持。无子组件 一、接口 Slider(options?: {value?: number, min?: number, max?: number, step?: number, style?: SliderStyle, direction?: Ax…

python pyqt5 计算下载文件的进度百分比

python pyqt5 计算下载文件的进度百分比 思路 从远程服务器获取文件大小每次只下载固定大小下载次数 固定大小 已下载大小已下载大小 ➗ 文件大小 * 100 即为下载进度百分比 代码 import subprocess import requests from contextlib import closing# 下载文件 r reques…

神经网络中的反向传播:综合指南

塔曼纳 一、说明 反向传播是人工神经网络 (ANN) 中用于训练深度学习模型的流行算法。它是一种监督学习技术,用于调整网络中神经元的权重,以最小化预测输出和实际输出之间的误差。 在神经网络中,反向传播是计算损失函数…

pydantic学习与使用-17.使用 json_encoders 格式化 datetime 类型

前言 使用datetime 日期类型时,想格式化成自定义的"%Y-%m-%d %H:%M:%S" 格式 datetime 类型 from pydantic import BaseModel from datetime import datetime # 上海悠悠 wx:283340479 # blog:https://www.cnblogs.com/yoyoketang/class UserInfo(Base…

欧科云链研究院:人类或将成为仅次于AI第二聪明物种?Web3不允许

出品|欧科云链研究院 在 AI行业“掘金买铲”的英伟达,60%的红杉投资在AI相关领域,之前只专注Web3的顶级VC,Paradigm 正在从转向人工智能等 "前沿 "技术。 资本的追逐让AI迷人且危险。 OKG RESEARCH IN FT AI教父Geoffre…

双飞翼布局和圣杯布局

双飞翼布局和圣杯布局都是一种三栏布局,其中主要内容区域位于中间,左侧栏和右侧栏位于两侧。它们的实现方式类似,但有一些细微的差别。 双飞翼布局的实现原理是通过使用flex布局,给主要内容区域设置flex:1&#xff1b…

Android中PowerManager 类中找不到 goToSleep()

PowerManager 类中找不到 goToSleep() 方法,可能是因为该方法是 Android SDK 的隐藏方法。这种情况下,您需要使用反射来调用该方法。 以下是一个调用 goToSleep() 方法的示例代码: try { PowerManager powerManager (PowerManager) ge…

为什么非const静态成员变量一定要在类外定义

当我们如下声明了一个类: class A{public:static int sti_data;// 这个语句在c11前不能通过编译,在c11的新标准下,已经能够在声明一个普通变量是就对其进行初始化。int a 10;static const int b 1;//...其他member };// 在类外…

RK3288 Android11 mini-pcie接口 4G模组EC200A适配(含自适应功能)

这里写目录标题 1、修改驱动内核配置①使能USBNET功能②使能 USB 串口 GSM、CDMA 驱动③使能 USB 的 CDC ACM模式④使能PPP功能 2、使用lsusb命令查看是否识别到usb接口的“EC200A”4G模组3、在drivers/usb/serial/option.c添加VID和PID信息①添加VID和PID定义②在option_ids 数…

Flink学习之旅:(一)Flink部署安装

1.本地搭建 1.1.下载Flink 进入Flink官网,点击Downloads 往下滑动就可以看到 Flink 的所有版本了,看自己需要什么版本点击下载即可。 1.2.上传解压 上传至服务器,进行解压 tar -zxvf flink-1.17.1-bin-scala_2.12.tgz -C ../module/ 1.3.启…

SpringCloud: sentinel链路限流

一、配置文件要增加 spring.cloud.sentinel.webContextUnify: false二、在要限流的业务方法上使用SentinelResource注解 package cn.edu.tju.service;import com.alibaba.csp.sentinel.annotation.SentinelResource; import com.alibaba.csp.sentinel.slots.block.BlockExcept…

Linux软件包名称含AMD,ARM,x64的详解

下载clickhouse-backup时看到不同软件包,有的是x86,有的是amd64,有的是arm64,这些有啥区别呢? clickhouse-backup-2.4.2-1.x86_64.rpm clickhouse-backup_2.4.2_amd64.deb clickhouse-backup_2.4.2_arm64.deb x86 和 …

哪个牌子的护眼灯防蓝光效果好?2023防蓝光护眼灯推荐

可以肯定的是,护眼灯一般可以达到护眼的效果。 看书和写字时,光线应适度,不宜过强或过暗,护眼灯光线较柔和,通常并不刺眼,眼球容易适应,可以防止光线过强或过暗导致的用眼疲劳。如果平时生活中需…

将Sketch文件转化为PSD文件的简单在线工具!

设计工作不仅需要UI设计工具,还需要Photoshop。常见的UI设计工具Sketch与Photoshop软件不兼容。如果你想在实际工作中完成Sketch转psd,你需要使用其他软件进行转换。但是在转换过程中容易丢失文件,导致同样的工作需要重复多次才能完成&#x…

leeetcode_2530 执行k次操作后的最大分数

1. 题意 给定一个整数数组&#xff0c;每次可以取出一个数累加并丢弃&#xff0c;并将该数的1/3 向上取整放回。 执行k次操作后的最大分数 2. 题解 每次取插入后的最大值&#xff0c;所以需要大根堆。 STL里面用priority_queue<int> pq默认大根堆。 小根堆priority_…

汽车屏类产品(二):360全景环视(SVC)、多分割显示、行车记录

前言 随着新能源汽车的快速发展,带动了车载器件的大发展,大的比如域控,小的创新更是不断涌现。而车载显示屏可以说是一大类产品,产品形态也是愈发多样化,比如:仪表cluster、中控IVI、副驾屏、行车记录仪、流媒体后视镜、透明A柱屏、方向盘屏(替代方向盘按键)、门饰板显…

数据的表示和运算

一、数据与编码 1、r进制计数法 2、任意进制->十进制 3、二进制<->八进制&#xff0c;十六进制 4、常见书写方式 5、十进制转任意进制 方法一、整数部分除基取余&#xff0c;小数部分乘基取整 方法二、拼凑法 例1: 7564008021 0.3约等于0.25 故&#xff1a;100101…

“创新启变 聚焦增长”极狐(GitLab)媒体沟通会,共话智能时代软件开发新生态

10 月 18 日 北京 昨日&#xff0c;全球领先 AI 赋能 DevSecOps 一体化平台极狐(GitLab) 在北京举办了主题为“创新启变 聚焦增长”的媒体沟通会。极狐(GitLab) CEO 柳钢就“中国企业数字化转型、软件研发、技术自主可控等热点问题&#xff0c;以及 AI 大模型时代下&#xff0c…

vscode调试container(进行rocksdb调试)+vscode比较git项目不同分支和fork的哪个分支

vscode调试container&#xff08;进行rocksdb调试&#xff09; 参考链接&#xff1a; https://blog.csdn.net/qq_29809823/article/details/128445308#t5 https://blog.csdn.net/qq_29809823/article/details/121978762#t7 使用vscode中的插件dev containners->点击左侧的…

最新AI创作系统ChatGPT源码+搭建部署教程+支持GPT4.0+支持ai绘画(Midjourney)/支持Prompt

一、AI创作系统 SparkAi创作系统是基于OpenAI很火的ChatGPT进行开发的Ai智能问答系统AI绘画系统&#xff0c;支持OpenAI GPT全模型国内AI全模型。本期针对源码系统整体测试下来非常完美&#xff0c;可以说SparkAi是目前国内一款的ChatGPT对接OpenAI软件系统。那么如何搭建部署…
最新文章