Thursday, April 3, 2025

Tables in Bookdown pdf_book format

Motivation

 I once spent a few days trying to tame tables in a Bookdown pdf_book format. I checked a few R packages, which were supposed to help me. They were not visual, took time to learn and eventually not very useful. When I tried to add a caption to my table my other text which should follow the table was getting inside it. Now I know that it is not package authors fault but the way the package output was rendered in Rmarkdown pdf output, and it might be fixed at the time you are reading the post. I looked up Bookdown book by Yihui Xie and he favors HTML format, or at least methods which work both with CSS and LaTeX. His preferable method did not fit in visible part of my RStudio editor window.

I decided to focus on PDF format because I wanted a book with easy references and index. Plus I can put in my GitHub account, update versions and download to read offline. 

 I searched internet and discovered that there are a lot of advice to switch to CSS format as well. Apparently I have no choice but to do it myself and get my hands dirty. In addition I learned that my Rmarkdown file is converted to TEX format before turning into PDF, so why not to write LaTeX chunk which I want, without relying on correctness of Rmarkdown rendering?

I already had unsuccessful experience a few years back when I needed a LaTeX package in Rmarkdown and it did not work. I decided to try again, hoping that Rmarkdown is more developed now.  I examined different methods which were used in Bookdown pdf_book format, and discovered a lot of LaTeX commands and environments which were used as is. I tried the basic LaTeX environment for a table, tabular,  and it worked! Although for a really good table you need booktabs package.

LaTeX intro

If you do not know LaTeX at all, I provide a few tips here. Please remember that they work only for pdf_book format. If you try to use for text which you want in HTML format, too, they won't work.

1. LaTeX commands start with backslash, which is not read as it is but only serves to denote the command. Here is example of a command which place everything after it on a new page:

\newpage

Here couple of others, which add vertical space if you want more distance between your sections or your picture and your table:

\vspace{5mm}

\vspace{1cm}

Notice the metric system. Both commands should be padded by empty lines before and after to work properly.

2. Any number of empty lines in LaTeX is collapsed to one, and any number of inline spaces are collapsed to one, too. So it is the same as in markdown.

3. LaTeX has environments for specific page chunks, like pictures, tables, formulas, font types and sizes. Their start and end should be explicitly marked. They must be properly nested. Do not forget an environment end or confuse what environment ends where! LaTeX gods will be furious and stop helping you until you fix your wrong ways. You are not even likely to see a correct error wording, because you should know what you did. 💣

4. Basic and many other LaTeX environments use the same command to go to a next line: double backslash.

\\

One notable exception is math environments, which is a whole new game. Here is a basic, simple form when you only use dollar signs, for example $x^2$ will be converted to x squared in math script  on PDF page:

  

 

But when you need to display a bunch of formulas together, LaTeX has a choice of environments for you.  By the way to avoid LaTeX assuming that when you use $, you mean to open a math environment, use a backslash before it. Otherwise you will see a processing error. Say you want  to get: $5.67 in your markdown page, do it on this way: \$5.67.

Back to tables!

The LaTeX environment we need here starts with \begin{tabular} and ends with \end{tabular}. Below you see an example which has all necessary elements of a basic LaTeX tabular environment, how it is rendered and following explanation.

\begin{tabular}{l l}
column 1, row 1 & column 2, row 1 \\
column 1, row 2 & column 2, row 2 \\
\end{tabular} 

Here is what you see on your PDF page:

Note how we correctly stated what is the start and termination of the environment. In addition, there are a few elements which are mandatory. 
1. As you see I have {l l} right after the environment opening. This is a column for alignment selections. I have 2 of them because I have 2 columns and l options mean that each my column must be left aligned. Other options here are r for right and c for center.
2.  Each row ends with double backslash: \\
3. In a row entries of different columns are separated by & symbol.
This table is rather bare bone and we do not have divider lines or borders.

Now let us add horizontal and vertical borders.  For the horizontal lines we will use the commands  \toprule\bottomrule and \midrule. The first 2 can be appear only once each in the table, while the last one can be used as many times as needed. Here how it looks like as a raw text:
 \begin{tabular}{l l}
\toprule
column 1, row 1 & column 2, row 1 \\
\midrule
column 1, row 2 & column 2, row 2 \\
\bottomrule
\end{tabular} 
Here is what you will see when you process the commands:

 Now we need to add vertical borders. They are less easy to spot in code because they are added to alignment options, so instead of
{l l} we type {|l|l|}: 
 
 \begin{tabular}{|l|l|}
\toprule
column 1, row 1 & column 2, row 1 \\
\midrule
column 1, row 2 & column 2, row 2 \\
\bottomrule
\end{tabular}
This LaTeX code can be used as is in Rmarkdown if you have pdf option for your output in YAML metadata (header).  Hope it helps!

Over-parameterization: a reprint about a geometric approach to estimating a number of parameters for Large Language and Image Processing Models

I learned how currently Large Language and Image Processing (Object Detection and Classification, Generation and others) models can benefit from over-parameterization in the book "Understanding Deep Learning" by Simon Prince, although at the moment nobody knew how it happens. I pondered on the question and, being an abstract mathematician by training, thought about something which is usually so useless in Data Science that people as a rule ignore it completely, namely an exact solution. It turned out that in this particular case searching for an exact solution might be a reasonable approach. Here is my pre-print about it: http://dx.doi.org/10.13140/RG.2.2.18776.61442