class: center, middle, inverse, title-slide # Generalized Linear Models Using TensorFlow ### Julio Trecenti ### 2019-03-27 --- <style type="text/css"> .remark-slide-content { font-size: 25px; padding: 1em 4em 1em 4em; } </style> # What do we want? .large[ `$$\hat{\boldsymbol \beta} = (\mathbf X^\top\mathbf X)^{-1}\mathbf X^\top\mathbf y$$` ] -- ## What are we going to do? - We are going to compute this thing in parallel. - **Spoiler**: Our proposal is faster when the number of predictors is large. --- # Generalized linear models ## Iterated Weighted Least Squares (IWLS) `$$\boldsymbol\beta^{(m+1)} = (\mathbf X^{\top}\mathbf W^{(m)}\mathbf X)^{-1}\mathbf X^{\top}\mathbf W^{(m)} \mathbf z^{(m)}$$` where `$$\mathbf z = \boldsymbol \eta + \mathbf W^{-\frac 1 2}\mathbf V^{-\frac 1 2}(\mathbf y - \boldsymbol \mu)$$` -- but `$$\begin{aligned} (\mathbf X^{\top}\mathbf W\mathbf X)^{-1}\mathbf X^{\top}\mathbf W \mathbf z &= ((\mathbf X^{\top}\mathbf W^{\frac1 2})(\mathbf W^{\frac1 2}\mathbf X))^{-1} (\mathbf X^{\top}\mathbf W^{\frac1 2})(\mathbf W^{\frac1 2} \mathbf z) \\ &= (\mathbf A^\top\mathbf A)^{-1}\mathbf A^\top \mathbf b \end{aligned}$$` --- # How is it actually done? ## QR Decomposition `$$\mathbf A = \mathbf Q\mathbf R,$$` where `\(\mathbf Q^\top\mathbf Q = \mathbf I\)` and `\(\mathbf R\)` is triangular. -- Then `$$\begin{aligned} (\mathbf A^\top\mathbf A)^{-1}\mathbf A^\top \mathbf b &= ((\mathbf Q \mathbf R)^\top (\mathbf Q \mathbf R))^{-1} (\mathbf Q\mathbf R)^\top \mathbf b \\ &= (\mathbf R^\top \mathbf Q^\top\mathbf Q\mathbf R)^{-1}\mathbf R^\top \mathbf Q^\top \mathbf b\\ &= \mathbf R^{-1}(\mathbf R)^{-1}\mathbf R^{\top}\mathbf Q^\top\mathbf b \\ &= \mathbf R^{-1}\mathbf Q^\top\mathbf b. \end{aligned}$$` --- # Actual implementation - Uses `LINPACK` algorithm, implemented in `Fortran` language. - Called by `C_Cdqrls`, implemented in `C` language. - Single threaded. - You can see what is done running `.lm.fit`. - Source code [here](https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/lm.c) and [here](https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/appl/dqrls.f). -- ### Alternative: `parglm` package - Implemented in `C++` language, also using `LINPACK`. - Multi-threaded. - Additional cost: `\(O(Mp^2+p^3)\)`, where `\(M\)` is the number of threads. - You can see what is done running `parglm:::parallelglm`. - Source code [here](https://github.com/boennecd/parglm/blob/master/src/parallel_qr.cpp). --- # TensorFlow - Runs in parallel by default. - Can take advantage of Graphics Processing Units (GPUs) -- ## `tensorglm` package - Uses QR decomposition from TensorFlow. - Implements `glm.fit` method using it. - The new call should be ```r glm(y ~ X, method = "glmtf.fit") ``` or ```r glmtf(y ~ X) ``` --- # Computation <img src="img/TensorBoard.png" width="80%" /> --- # Results <img src="img/grafico_bench_1.png" width="100%" /> --- # Results - Square root <img src="img/grafico_bench_sqrt_1.png" width="100%" /> -- Rule of thumb: `\(p > 5\sqrt N\)` --- # Known Issues - Does not compute `\(p\)`-values and confidence intervals yet. - Does not control tolerance. - The results may differ by the order of `\(10^{-6}\)`. --- # Next steps - More tests. - Submit to CRAN. ## Future - Implementation in Torch. - Implement gradient descent for out-memory databases (very large `\(N\)`) - Implement LASSO alternative. --- # Links - Presentation: https://jtrecenti.github.com/slides/emr2019 - GitHub: https://github.com/jtrecenti/tensorglm - Email: [jtrecenti@curso-r.com](mailto:jtrecenti@curso-r.com) ## References - [TensorFlow](https://www.tensorflow.org/). - [QR Decomposition](https://www.tensorflow.org/api_docs/python/tf/linalg/qr).