Deep Learning using R

class: center, middle, inverse, title-slide

# Deep Learning using R
### Curso-R
### 2018-12-03

---

# Goals

* What are deep neural networks and how they work?
* What software we can use to train these models and how they relate with each other?
* How train deep learning models for some prediction problems?

---

# Requisites

* Linear regression
* Logistic regression
* R: pipe (`%>%`)

---

# References

* [Deep Learning Book](https://www.deeplearningbook.org)
* [Deep Learning with R](https://www.manning.com/books/deep-learning-with-r)
* [Tensorflow for R Blog](https://blogs.rstudio.com/tensorflow/)
* [Keras examples](https://keras.rstudio.com/articles/examples/index.html)
* [Colah's blog](http://colah.github.io)

![](https://images-na.ssl-images-amazon.com/images/I/61fim5QqaqL._SX373_BO1,204,203,200_.jpg)

---

# Why "Deep" Learning?

* We use many composite nonlinear operations, called *layers*, to learn a representation
* The number of layers is the model depth
* Nowadays we have models with more than 100 layers

## Alternative names

- layered representations learning
- hierarchical representations learning

---

# Layers

![](https://user-images.githubusercontent.com/4706822/48164108-b9013200-e2c8-11e8-86ef-652bd7f6b19a.png)

---

# Deep Learning

![](https://user-images.githubusercontent.com/4706822/48164481-c834af80-e2c9-11e8-97d9-6cf234454aa2.png)

---

# Deep Learning

![](https://user-images.githubusercontent.com/4706822/48164502-d5ea3500-e2c9-11e8-8dea-150dff09131b.png)

---

# Deep Learning

![](https://user-images.githubusercontent.com/4706822/48164527-e4d0e780-e2c9-11e8-91b5-1490cd3eca92.png)

---

# Relation to Generalized Linear Models

- Linear regression: single layer neural network, no activation
- Logistic regression: single layer neural netork, logit activation

---

# Logistic regression

---

## Deviance function

`$$D(y,\hat\mu(x)) = \sum_{i=1}^n 2\left[y_i\log\frac{y_i}{\hat\mu_i(x_i)} + (1-y_i)\log\left(\frac{1-y_i}{1-\hat\mu_i(x_i)}\right)\right]$$`

`$$= 2 D_{KL}\left(y||\hat\mu(x)\right),$$`

where `$D_{KL}(p||q) = \sum_i p_i\log\frac{p_i}{q_i}$` is the Kullback-Leibler divergence.

---

## Deep learning

- Linear transformation of `$x$`, add bias and add some nonlinear activation.

`$$f(x) = \sigma(wx + b)$$`

---

## Loss function

`$$D_{KL}(p(x)||q(x))$$`

---

# Optimization: Stochastic Gradient Descent

```
for(i in 1:num_epochs) {
  grads <- compute_gradient(data, params)
  params <- params - learning_rate * grads
}
```

---

# SGD

---

## TensorFlow

It's a computational library

- Developed in Google Brain for neural network research
- Open Source
- Automatic Differentiation
- Uses GPU

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/AutomaticDifferentiationNutshell.png/1280px-AutomaticDifferentiationNutshell.png)

---

## Tensor

(2d)

```
##       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##  [1,]          5.1         3.5          1.4         0.2       1
##  [2,]          4.9         3.0          1.4         0.2       1
##  [3,]          4.7         3.2          1.3         0.2       1
##  [4,]          4.6         3.1          1.5         0.2       1
##  [5,]          5.0         3.6          1.4         0.2       1
##  [6,]          5.4         3.9          1.7         0.4       1
##  [7,]          4.6         3.4          1.4         0.3       1
##  [8,]          5.0         3.4          1.5         0.2       1
##  [9,]          4.4         2.9          1.4         0.2       1
## [10,]          4.9         3.1          1.5         0.1       1
```

---

## Tensor

(3d)

![](https://github.com/curso-r/deep-learning-R/blob/master/3d-tensor.png?raw=true)

---

## Tensor

(4d)

---

## TensorFlow

.pull-left[
  ![](https://github.com/curso-r/deep-learning-R/blob/master/flow.gif?raw=true)
]

.pull-right[
  - Define the graph
  - Compile and optimize
  - Execute
  - Nodes are calculations
  - the tensors *flow* along the nodes.
]

---

## Keras

* API used to specify deep learning models in a intuitive flavor.
* Created by François Chollet (@fchollet).

* Originally implemented in `python`.

---

## Keras for R

![](imgs/keras.svg)

---

## Keras + R

* R package: [`keras`](https://github.com/rstudio/keras).
* Based in [reticulate](https://github.com/rstudio/reticulate).
* Developed by JJ Allaire (CEO at RStudio).
* R-like syntax using `%>%`.

---

# Example 01

---

# Activation

![](https://user-images.githubusercontent.com/4706822/48278301-48772400-e434-11e8-9487-641d2a79c2f8.png)

---

# Activation problems

![](https://user-images.githubusercontent.com/4706822/48278617-334ec500-e435-11e8-8fcc-8838d2cc4590.png)

---

# Example 02

---

# Example 03

---

# Convolutions

---

# Convolutions

![](https://user-images.githubusercontent.com/4706822/48296585-99f8d080-e47f-11e8-9d91-23c5f6a55f03.png)

---

# Max Pooling

![](https://user-images.githubusercontent.com/4706822/48281479-df94a980-e43d-11e8-9dcf-e67d7ba053e4.png)

---

# Convolutions

![](https://user-images.githubusercontent.com/4706822/48281946-6bf39c00-e43f-11e8-845c-2d08570d85a6.png)

---

# Binary Cross-Entropy

![](https://user-images.githubusercontent.com/4706822/48296375-34a3e000-e47d-11e8-9ecd-510d722f9c7f.png)

---

# Example 04

---

# Categorical Cross-Entropy

![](https://cdn-images-1.medium.com/max/1600/1*AlbV9jz2k3Ll1wEMCljdSg.png)

---

# Example 05

---

# Stalk me

- Curso-R: [jtrecenti@curso-r.com](mailto:jtrecenti@curso-r.com)
- CONRE-3: [jtrecenti@conre3.org.br](mailto:jtrecenti@conre3.org.br)

## Pages:

- https://curso-r.com
- https://curso-r.com/blog
- https://curso-r.com/material
- https://github.com/curso-r

Presentation: https://jtrecenti.github.io/slides/ime-dl/

Code: https://github.com/jtrecenti/slides