Title: Dynamic Documents with R and knitr 2nd ed Authors: Yihui Xie Edition: 2 Finished Date: 2017-04-29 Rating: 1 Language: English Genres: Programming, R, Software, RStudio Level: Entry Publishers: Chapman and Hall/CRC Publication Date: 2015-06-24 ISBN: 978-1498716963 Format: Pdf Pages: 294 Download: Pdf

resources

symbol of the book

• bold text: function names
• italic: function names
• typewriter font: inline code
• serif fonts: filenames

source code produces the output => maintain the source code only

1. write program to do computing
2. write narratives to explain what is being done by the program code

technically, literate programming involves 3 steps can be implemented in software packages. The authors control the style of the output

1. parse the source document and separate code from narratives
2. execute source code and return results
3. mix results from the source code with the original narratives

Ch 2. reproducible research

Reproducible research (RR) is one possible by-product of dynamic documents, but dynamic documents do not absolutely guarantee RR.

example: Monte Carlo simulation

• with a certain random seed and got a good estimate of a parameter, but the result was actually due to a “lucky” random seed.
• Although we can strictly reproduce the estimate, it is not actually reproducible in the general sense. Similar problems exist in optimization algorithms, e.g., different starting values can lead to different roots of the same equation.

2.1 literature

the term reproducible research

• first proposed: Jon Claerbout @ Stanford University (Fomel and Claerbout, 2009)
• the final product of research is not only the paper itself, but also the full computational environment used to produce the results in the paper such as the code and data necessary for reproduction of the results and building upon the research.
• RR is often related to literate programming

Knuth 1984

• recommendation

• source files

• put them under the same directory
• use relative paths whenever possible
• Do not change the working directory after the computing has started

set working directory in the very beginning of an R session

the working directory is set to the directory of the source document before knitr is called to compile documents.

• Compile the documents in a clean R session

in the end

1. start a new R session
2. compile a report in the batch mode
3. all the results are freshly generated from the code.
• Avoid the commands that require human interaction

write the filename explicitly

read.table(’a-specific-file.txt’)

• Avoid environment variables for data analysis

because they require additional instructions for users to set up, and humans can simply forget to do this. If there are any options to set up, do it inside the source document.

• attach

• sessionInfo() or devtools:sessionInfo()
• instructions on how to compille the document

it is better to provide the instructions in the form of a computer script; e.g., a shell script, a Makefile, or a batch file.

###2.3 barriers of reproducible reports

• data can be huge
• confidentiality of data
• outdated software version
• compile differently in different operating system
• competition

one may choose not to release the code or data with the report due to the fact that potential competitors can easily get everything for free, whereas the original authors have invested a large amount of money and effort

Ch.3 a first look

requirement of installation

• LaTex

• windows: MiKTEX
• Mac: MacTEX
• Linux: TEXLive
• HTML: nothing

• Markdown: nothing

knit(): compile source documents

stitch(): from source R file

• knitr provides a template of the source document with some default settings
• Currently it has built-in templates for LATEX, HTML, and Markdown.

• stitich(): tex

• stitch_rhtml(): html
• stitch_rmd: Markdown

literate programming document

• weave: compile it to a re- port (run the code) knitr()
• tangling: extract the program code in it purl()

the result: R script: consists of all code chunks in the source document

Rnw: change options from Sweave to knitr

Ch.4 editors

###4.1 RStudio

###4.2 LYX

###4.3 Emacs/ESS

###4.4 other editors

Ch.5 document formats

3 key components of the design of knitr package

1. a source parser
2. a code evaluator
3. an output renderer

parser

1. parse the source document
2. identify computer code and inline code

evaluator

1. execute the code
2. return results

renderer

1. format the results in an appropriate format
2. combine with the original documentation

knitr components’relationship with document format

• independent of the document format: evaluator

• have relation to document format:

• parser: input syntax
• renderer: output syntax

###5.1 input syntax

regular expression

• identify

• cod blocks i.e., chunks
• other elements

• inline code
• codes are in all_pattern object

• store in pattern.R

Two more technical notes about the regular expression above:
1. \s denotes a white space in regular expressions, but in R we have to write double backslashes because \ in an R string re- ally means one backslash (the first backslash acts as escaping the second character, which is also a backslash); the backslash as the escape character can be rather confusing to beginners, and the rule of thumb is, when you want a real backslash, you may need two backslashes;
2. the braces () in the regular expression group a series of char- acters so that we can extract them with back references, e.g., we extract the second group of characters from abbbc:

###5.1.1 chunk options

The syntax for chunk options is almost exactly the same as the syntax for function arguments in R

option = value

as long as the option values are valid R code, they are valid to knitr => write arbitrary valid R code for chunk options, which makes a source document programmable

Example

short form: eval=bar<5

###5.1.2 chunk label

chunk label: label = “character”

• data type: character
* quote
* unquote: knitr quote internally

• omit label= when arguments by position, not by name
• unique

• if not unique: knitr stop and give an error

because: potential danger that the files generated from one chunk may override the other chunk

• purpose: generate external files

• images
• cache files
• label is empty: automatically generate a label of the form unnamed-chunk-i, i = 1,2,3…

Example

###5.1.3 global options

opts_chunk in defaults.R

• global options are shared across all the following chunks after the location in which the options are set
• local options in chunk override global options

###5.1.4 chunk syntax

5.2 document formats

code chunks can be indented by any number of spaces in all document formats

5.2.1 Markdown

problems of Markdown: each derivative has its own definition of certain elements, such as, tables

CommonMark http://commonmark.org/: try to give standard of Markdown syntax

Pandoc’s Markdown is compatible with the CommonMark standars

5.2.3 HTML

use comment syntax in order to write R code in HTML document

1. create a file R HTML
2. write code

begin code: <!--begin.rcode
end code: end.rcode-->

chunk options: <!--begin.rcode fig.width=7, fig.height=6

inline codes: <!--rinline pi -->

5.2.4 reStructuredText

page 36

reStructuredText (reST) document: http://docutils.sourceforge.net/rst.html

like Markdown

more powerful, more complicated

5.2.5 AsciiDoc

page 37

https://en.wikipedia.org/wiki/AsciiDoc

5.2.6 Textile

page 37

https://en.wikipedia.org/wiki/Textile_(markup_language)

5.2.7 customization

page 37

use one’s own syntax to parse a source document

knit_patterns: manage regular expressions

override the default syntax: knit_patterns\$set()

Example

when parse a source document

1. match pattern list to the filename extension

look for whether the syntax matches with existing pattern list

5.3 output renderes

• function eval() in base package: execute inline R code

1. parse
2. evaluate

eval(parse(text = "1+1"))

• evaluate code chunks: evaluate package

loop

1. evaluate package, function evaluate()

1. takes a chunk of R source code
2. evaluate
3. return a list. 6 possible classes

• character: normal text output
• source: source code
• warning
• message
• error
• recordeplot: plots
2. knitr package: object knit_hooks: a list of output hook functions to construct the finial output based on output format

the form of a hook function

• x: raw output from R
• options: a list of current chunk options