Learning RStudio for R Statistical Computing

没看完, 不想看了


Title: Learning RStudio for R Statistical Computing
Authors: CMark P.J. van der Loo, Edwin de Jonge
Edition: 1
Finished Date: 2013-05-31
Rating: 3
Language: English
Genres: Programming, R, Software, RStudio
Level: Entry
Publishers: Packt Publishing
Publication Date: 2012-12-24
ISBN: 978-1782160601
Format: Pdf
Pages: 126
Download: Pdf

Chapter 1: Getting started

The R environment is a so-called repl, which stands for a read-evaluate-print loop. That is, it offers a text-based interface where you can enter R commands. After a command is entered, the R engine processes it (evaluation) and possibly prints a result to the screen. Alternatively (and more commonly), the commands can be stored in a text file to be run by R.

Important tips for your maintaining of your R installation are mentioned as follows:

  • Always use the latest, stable version. This is the version likely to have the least bugs in the older functionality. You can read about the latest features by reading the news file, for example by running View(news()) from the R command line. See the Installing R section for an easier way to install R.

Update R

  • Frequently update your installed packages. This is simply done by running the update.packages() command from your R console.

RStudio is an Integrated Development Environment (IDE) for R. The term IDE comes from the software industry and refers to a tool that makes it easy to develop applications in one or more programming languages. Typical IDEs offer tools to easily write and document code, compile and perform tests, and offer integration
with a version control tool.

One of the tabs in the bottom right-hand side of RStudio is a package panel that allows you to browse the currently installed packages. These packages can be updated by clicking on Check for Updates. RStudio will check what packages have newer versions and will give you the option to select which of these packages should be updated. Alternatively you can use the General menu’s Tools | Check for Package Updates.

To load the package, scroll down the window with installed packages and check it. The package is now loaded.

Tips

Trying to update a package that is currently loaded may fail. The easiest solution is to close and restart RStudio and update
again without the package being loaded.

Create a new project

We create an R project using the menu Project | New Project. Choose New Directory and name the project file Abalone.

write.csv(abalone, "abalone.csv", row.names=FALSE)

This sets the correct names for the data set and stores the data in your project directory, so you don’t have to download it again. This data file is part of your compendium.

plot( Length ~ Sex, data = abalone ) 画出来的是boxplot

  1. Choose File | Compile Notebook.
  2. Close the Abalone project with Project | Close Project. Choose Save.
    We have now a new empty RStudio session.
  3. Open your newly created an Abalone project by navigating to Project | Recent Projects | Abalone.

Ask a question

When you post a question, it helps a lot to include a small example that reproduces your problem. Also, you may want to attach the output of R’s sessionInfo() command to show in what context the problem occurred. Finally, it can be helpful if you attach RStudio’s logfile. You can find the folder where it is stored by opening

Help>Diagnostics>Show log files. If RStudio fails to start, you can find it in the following place folder:

Operating systems Folder paths
Windows XP %USERPROFILE%\Local Settings\Application Data\RStudio-Desktop\log
Windows Vista, 7 %localappdata%\RStudio-Desktop\log
Linux, Max OS x ~/.rstudio-desktop/log/

Further reading

The paper Statistical Analyses and Reproducible Research by Robert Gentleman and Duncan Temple Lang offers a thorough description of methods for reproducible research. It can be downloaded for free from http://biostats.bepress.com/bioconductor/paper2/. There are many books for learning about R, a lot of which
are dedicated to specific subjects. Two recent books that discuss R in general that have quickly gained popularity are R in a Nutshell by Joseph Adler, 2010, O’Reilley, and The Art of R programming by Norman Matloff, 2011, No Starch Press, Inc. The former book discusses R as a language as well as many statistical features while the latter thoroughly discusses R as a programming language. Two books focusing on
general statistics with R are worth mentioning here as well. The first is Introductory Statistics with R (2nd ed. 2008, Springer) by Peter Dalgaard. The second is Introductory Probability and Statistics Using R by G. Jay Kerns. The latter book is developed as an open source project and can be downloaded from http://ipsur.org/.

To keep up-to-date information on what happens in the R community, we highly recommend frequent visits to Tal Galili’s http://r-bloggers.com. This website collects a large amount of R related blogs in a convenient newspaper-like layout. Subscribing with an RSS reader for smartphone or PC is also possible.

Chapter 2: R scripts and consoles

The most important shortcuts to remember are Ctrl+1 to move to the source editor and Ctrl+2 to move to the console. The following is a table with every shortcut:
Numbers for shortcuts Focus

  1. Source editor
  2. Console
  3. Workspace browser
  4. History editor
  5. File browser
  6. Plots area
  7. Packages
  8. Help
  9. Git/SVN version control

Command history

RStudio offers three ways to retrieve and restore all the commands that you entered
The first is by scrolling through your commands by hitting the up or down arrow keys, when in the console. Previous commands are shown on the command line one by one. Press Enter to execute the current command or Esc to return to an empty line.

The second way to scroll through your command history is to press Ctrl+up. This opens a popup screen showing previously given commands. You can select a command with the up and down keys or by clicking on them with the mouse. Press Enter to copy the selected command to the console, and hit Enter again to execute it.

The third and the most extensive way to inspect or alter the command history is by using the command history panel. The command history panel is situated in the top right-hand side panel, under the second tab. You can activate it by pressing Ctrl+4.

Commands can be re-executed by selecting them and pressing Enter, or by clicking the To Console button at the top of the panel. The commands will be copied to the console, executed, and then focus is set to the console.

Commands can be deleted from the history by pressing the Delete button (with the white cross in the red circle) at the top of the panel. Alternatively, the entire history may be deleted by pressing the broom button next to it.

The entire command history can be saved by clicking on the Save button (with the image of the blue floppy disk) at the top of the panel. The commands are stored with the extension .Rhistory. In the spirit of openness, this file is a simple text file with R commands. So even if you uninstall RStudio, your command history is available to be edited with any text editor, or to be sourced by R. Previously saved command histories can be loaded using the load history button (with the folder icon) on the left-hand side.

Loading and saving command histories is not the recommended way to make your analyses reproducible. When working in the console, one typically repeats or alters commands on-the-fly, making a command line history difficult to read. If you performed an analysis that you want to reproduce, there is a better way to do so: by saving it as a source file.

Selected commands can be copied to a source file by clicking on the To Source button at the top of the history panel. If no source file was open yet, a new one will be opened for you. This way you may edit the commands into a real script and store them as a .R file, which is usual for analyses automation.
Your history file typically contains many copies of a command. RStudio can remove all duplicated history entries automatically. This can be set in Tools | Options | R General.

Command completion

Activating command completion is very easy—just type the beginning of what you aim to type and hit Tab. RStudio can complete functions and function arguments, objects in the R environment, and filenames (strings). Finally, there is bracket completion, which is performed automatically without pressing Tab. Each completion feature is discussed separately in the following section.
Type s in the console and hit Tab. After pressing Tab, a pop-up menu shows completion options.

  1. RStudio shows a pop-up menu with possible completion options that may include variables from the workspace or names of (possibly self-defined) functions. You can scroll through the options using the up and down arrow keys. Pressing Tab again (or Enter or Right) completes the command and closes the pop-up screen.
  2. Behind the function name in the pop-up menu, the name of the package containing the function is displayed. Alongside the list is the Description and Usage portion of the R help file that comes along with the function. Pressing F1 opens the whole help file for that function in RStudio’s help browser.
  3. Once a function name is completed, type an opening bracket “(“ and hit Tab. RStudio opens a popup with the function arguments and their descriptions from the function’s help file. Pressing Tab (or Enter or right arrow key) copies the selected argument and equals symbol to the command line and closes the popup.

Object completion

The Tab completion functionality attempts to complete a non-finished command in any way possible, including names of objects and functions defined by the user in R’s workspace. Moreover, for objects that allow R’s dollar operator, tab expansion of subobjects is available as well. The most important and useful examples thereof are data.frame and list objects, as it is very common to make typing errors in names of data.frames. As an example, load the iris dataset by typing the following in the console:
data(iris)

To select a column, type iris$ and hit Tab. A popup with a list of columns in the iris data.frame appears for selection.

For the advanced user, completion using the Tab key also works for instances of self-defined S4 objects for which the dollar operator has been overloaded.

Completion of filenames

Entering long path and filenames can be a nuisance. Fortunately, RStudio also completes strings into filenames. To try this, just enter a single or double quote at the command line and hit Tab. A popup with file and directory names in RStudio’s current working directory is shown. For partially completed strings, completions are suggested from the partially completed path in the string. If you are working in an RStudio project, the completion assumes that paths are relative to the project directory. It is a good idea to use paths relative to your project directory, as it allows you to effortlessly move your whole project.

Recall that R expects a forward slash “/“ to indicate levels in a directory structure. As a mnemonic, you may think of the “address” of your file as a sort of web address (URL) that also uses forward slashes. Forward slashes are also common in Unix-like systems and Mac OS X (which is Unix-like at its core). Alternatively, under Windows, one forward slash may be replaced by two backslashes “\“.

Keyboard shortcuts for the console

Many shortcuts that are common in text editors are supported by RStudio, including Ctrl+left/right arrow keys to jump a word, Shift+left/right arrow keys for selection and Home and End to jump to the beginning or end of a line. Below is a table of shortcuts for the R console; some of them will be familiar to users of unix shell systems.

Windows & Linux Mac Description

  • Tab (or Ctrl+space) Tab (or Command+space) Command completion
  • Esc Esc Interrupt current command
  • Ctrl+up Command+up Command history popup
  • Up/down arrow keys Up/down arrow keys Scroll through history
  • Ctrl+L Command+L Clear console

Features of the source editor

The editor panel of RStudio supports editing several file formats such as HTML, Sweave, Markdown, C, C++, and JavaScript files.
Every code completion feature described in the previous section also works in the source editor

A few words on code quality

A basic rule of thumb is Don’t Repeat Yourself (DRY). As soon as you have to write a line of code two or three times, write a loop or a function.

“Premature optimization is the root of all evil.”

This quote by famous computer scientist Donald Knuth tells you that at least in the beginning of your project, the most important feature is that your code works the way it should, and that you can read and understand it exactly. If you DRY and write functions, it is simple to replace a slow and simple function with a fancy fast one.

Use indentation to separate blocks such as for-loops and if-then-else statements. RStudio will do this automatically for you, and it is bad practice to ignore or undo the automatic indentation. Use meaningful variable and function names. The name of a variable should reflect the meaning of its content (for example speed, length). For functions, imperatives describing the action a function carries out are often a good choice (for example downloadAbalone()).

In the ideal case, code is understandable without adding comments. However, some complicated pieces of code may need some clarifying remarks. In that case describe what the code is aimed to do, not how it does it. Realize that just like code, comments have to be maintained. So writing code that is readable without comments can save you a lot of time when fixing bugs or updating your compendium. It is better to have no comments than comments that are wrong.

Editing R scripts

new file. The shortcut for Save current document is very useful to memorize as well.

Windows/Linux Mac Description
Ctrl+Shift+N Command+Shift+N Opens a new R script file*
Ctrl+S Command+S Saves current document
Ctrl+W Command+W Closes current document**
Ctrl+O Command+O Opens document dialog
Ctrl+Up /Ctr+Alt+right arrow Ctrl+Option+right arrow Moves one tab to the right
Ctrl+Up / Ctr+Alt+left arrow Ctrl+Up / Ctr+Alt+left arrow Moves one tab to the left
Ctrl+Shift+C Command+Shift+C Comment/uncomment selection or

current line

Commenting code

Often during the development of scripts, it can be useful to comment and uncomment lines of code. In RStudio this can be done by selecting code and choosing Code | Comment/Uncomment lines (Ctrl+Shift+C). Note that activating and deactivating code with comments should not be part of your final work flow—it makes your actions non-reproducible. A better option is to split the code in functions and/or multiple files.
In more mature scripts it is good practice to add comments that explain parts of your code. Editing these descriptions can result in very long comment lines. RStudio can reformat comment lines with Ctrl+Shift+/.
Do not reformat comments on commented code, as the inserted newline characters may break or change the working of your code after you uncomment it.

Find and replace

RStudio offers basic find-and-replace functionality. Ctrl+F allows you to search for a text string within the current open document. Typing the string and hitting Enter gives you the first occurrence of the text.
By default searching for the texts is case insensitive, but this can be changed by selecting Match Case. It is also possible to use regular expressions (Regex) for searching and replacing your texts.
Find and replace using Regex is similar to the gsub function in R with perl=TRUE.
With Ctrl+Shift+F it is possible to search in multiple files. By default RStudio searches in the current working directory and its subdirectories, but this can be specified. Searching in multiple files results in an extra tab in the console panel named Find in Files. This panel lists all the occurrences of the search string. Clicking on an
occurrence opens the file at the right location in the script editor.

Folding, sectioning, and navigation

For easy editing and code inspection, the appearance of code in the editor can be customized. Code folding allows you to temporarily hide user-defined sections or indented blocks (functions, loops, and so on). RStudio also offers shortcuts and menus that allow for quick navigation between blocks and sections.

Code folding

Long scripts with many blocks of code can be hard to read. Often this is an indication that the script should be split into multiple files, but alternatively RStudio has a code folding feature that allows you to collapse blocks of code.

All the blocks with curly brackets ({}) and code sections (see the following code snippets) can be folded. All foldable code is preceded with a small triangle. Clicking on the triangle collapses or expands a code block. That a block of code is collapsed can also be seen in the gap of line numbers.

Keyboard shortcuts for code folding

Windows / Linux Mac Description
Alt+L Alt+L Folds selection
Shift+Alt+L Shift+Alt+L Unfolds selection
Alt+A Alt+A Folds all
Shift+Alt+A Shift+Alt+A Unfolds all

Code navigation

RStudio has lots of smart code navigation that can make code editing faster.
RStudio allows to go to a specific line number (Ctrl + G), but as line numbers are shown, you won’t use this feature is a lot.

With Code | Jump To… (Alt+Shift+J) it is possible to jump to functions and code sections within the current file. RStudio shows the available destinations at the bottom of the window. A related navigation feature is hitting the F2 key by selecting the name of a function. RStudio will open up the file with the function definition. This even works for functions from base R and R extension packages.

Functions definitions without curly braces (often used for simple one-line function definitions) will not be found by the jump-to function.
Even more useful is the Code/Go to File/Function… (Ctrl+.) option. It helps to quickly locate and load functions in your script files. RStudio will show all the available functions and files in the current working directory and its subdirectories that start with the characters you type. Behind function names is the script file where it is located.

When writing multiple R scripts simultaneously and jumping between files, it is easy to lose track of changes. RStudio allows you to navigate between files using Back (Ctrl+F9) and Forward (Ctrl+F10). RStudio remembers the positions where edits were made and facilitates jumping between them.
Keyboard shortcuts for code navigation

Windows/Linux Mac Description
Alt+Shift+J Alt+Shift+J Jump to function definition (user defined)
Ctrl+. Ctrl+. Go to File/Function
F2 F2 Show function definition
Ctrl+F9 Ctrl+F9 Back
Ctrl+F10 Ctrl+F10 Forward

Code sections

Code sections are not an R, but an RStudio feature. You can structure your R code by partitioning your scripts into sections. Sections are still valid R, because they are implemented as a comment with a special syntax.
The syntax for a section is as follows:
# <sectionname> ---

Here <sectionname> is the name that you want to assign to a section. A section can also be inserted from the RStudio menu: Code | Insert Section (Ctrl + Shift + R). RStudio will ask you to name your section and insert the comment with the section name.

Code completion of your code in the editor window will only work if the objects are available in your workspace. Make sure that you execute the assignment of objects in the editor.

Executing a script file line by line is tedious. So RStudio makes it easy to execute (or source) all the lines of a script file with Ctrl+Shift+Enter. This will copy all the lines to the console and execute them. The output of the script is printed in the console windows. Note that RStudio treats the execution of all the lines as one statement.

It is also possible to source the current file without printing statements in the console.

This can be done with Ctrl+Shift+S. RStudio makes this even easier with the Source on Save option that is on top of the editing window. Whenever you save your file, it is automatically sourced. This ensures that your workspace always contains the latest version of your objects and functions.

Don’t use Source on Save on scripts that take a long time to run. It can be frustrating to wait a long time when changing and saving a file.

Keyboard shortcuts for code execution

Windows / Linux Mac Description
Ctrl+Enter Command+Enter Runs current selection or line
Ctrl+Shift+P Command+Shift+P Re-runs last executed code
Ctrl+Shift+Enter Command+Option+R Runs whole current document
Ctrl+Alt+F Command+Option+F Runs current function definition
Ctrl+Alt+B Command+Option+B Runs from first to current line
Ctrl+Alt+E Command+Option+E Runs from current line to end