Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

 

 

目录

原文解读

文章内容以及划重点

结论


 

 

 

原文解读

原文:Understanding the difficulty of training deep feedforward neural networks

 

文章内容以及划重点

Sigmoid的四层局限


sigmoid函数的test loss和training loss要经过很多轮数一直为0.5,后再有到0.1的差强人意的变化。

 

     We hypothesize that this behavior is due to the combinationof random initialization and the fact that an hidden unitoutput of 0 corresponds to a saturated sigmoid. Note that deep networks with sigmoids but initialized from unsupervisedpre-training (e.g. from RBMs) do not suffer fromthis saturation behavior.

 

tanh、softsign的五层局限



换为tanh函数,就会很好很快的收敛

 

结论

1、The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network. We call it the normalized initialization


2、结果可知分布更加均匀

     Activation values normalized histograms with  hyperbolic tangent activation, with standard (top) vs normalized  initialization (bottom). Top: 0-peak increases for  higher layers.
       Several conclusions can be drawn from these error curves:  
(1)、The more classical neural networks with sigmoid or  hyperbolic tangent units and standard initialization  fare rather poorly, converging more slowly and apparently  towards ultimately poorer local minima. 
(2)、The softsign networks seem to be more robust to the  initialization procedure than the tanh networks, presumably  because of their gentler non-linearity. 
(3)、For tanh networks, the proposed normalized initialization  can be quite helpful, presumably because the  layer-to-layer transformations maintain magnitudes of activations (flowing upward) and gradients (flowing backward).
3、Sigmoid 5代表有5层,N代表正则化,可得出预训练会得到更小的误差




相关文章
Understanding the difficulty of training deep feedforward neural networks 本文作者为:Xavier Glorot与Yoshua Bengio。

一个处女座的程序猿 CSDN认证博客专家 华为杯研电赛一等 华为研数模一等奖 国内外AI竞十
人工智能硕博生,目前兼职国内外多家头部人工智能公司的AI技术顾问。拥有十多项发明专利(6项)和软件著作权(9项),多个国家级证书(2个国三级、3个国四级),先后获得国内外“人工智能算法”竞赛(包括国家级、省市级等,一等奖5项、二等奖4项、三等奖2项)相关证书十多个,以上均以第一作者身份,并拥有省市校级个人荣誉证书十多项。正在撰写《人工智算法最新实战》一书,目前已37万字。
已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 代码科技 设计师:Amelia_0503 返回首页
实付 29.90元
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值