Introduction to R

Three-day Workshop


R is a free software environment for scientific and statistical computing and graphics that runs on all common computing platforms. With an active and highly skilled developer community working on development and improvement, it has become an environment of choice for the implementation of new methodologies while attracting attention from statistical application area specialists. The powerful and innovative graphics abilities available in R include the provision of well-designed publication-quality plots.

This workshop was motivated by recent discussions with the Australian Department of the Environment and Education. It is intended for both new and experienced statisticians wishing to learn R. No prior knowledge of statistics or R is assumed for this workshop.


Workshop Contents

The workshop will consist of six sections over three days, covering the following topics:

Day 1

Session 1 The Basic R Environment

  • Introductory Material

    • What is R? What does it look like? How does it operate? What are the basic functions? What else can it do?

  • Operating in the R environment

    • Data science definitions, processes for doing data science, understanding and preparing data, how to use the console, introducing the R programming language

  • Aspects of procedural programming

    • Creating R functions, customising code, using packages

  • R and Big Data

    • The big data ecosystem

Session 2 – R Data Types

  • Basic data types

    • R data structures, character data types, lists, re-code data

  • Creating matrices and using them for basic computations

    • Analysing data to illustrate the use of matrices

  • Creating, subsetting and comparing:

    • Factors and categorical data

    • Lists

  • Creating, subsetting, and reordering data frames.

  • Creating and naming vectors to analyse data, selecting elements to compare different vectors

Day 2

Session 3 – R Input/Output

  • File I/O functions

  • Importing and exporting different file types

  • Inputting and manipulating data

    • Generating output to external files

  •  Connecting R to databases

  •  Reading data from files

Session 4 – Data Visualisation

  • Visualising data

    • Plotting trends of multiple indicators using R code, creating complex visualisations of results, plotting confidence intervals, manipulating data, identifying individual characteristics that explain data

  • R Package: graphics

    • plot(), barplot(), hist(), boxplot()

  • R Package: stats

    • heatmap()

  • R Package: ggplot2

    • qplot(), ggplot()

  • Presenting visual results

  • Different ways of visualising data

    • Graphs vs maps, interactive maps for displaying variations

Day 3

Session 5 – Statistical Analysis

  • Statistical operations

  • Common data mining

  • Descriptive statistics

    • Variables and types, standard and statistical functions, closures, more complex descriptive analyses using survey data, dealing with NA

  • Introducing linear regressions, classifications, using regression, graphing the regression line of our interactions, and the model coefficients, confounding factor.

  • Decision trees

Session 6 – Text Analysis

  • Text processing

    • tm package

    • wordclouds

  • Introduction to clustering

Program Format

The workshop will adhere to the following format. Please note that both teas and lunch are catered on both days, so please be sure to include dietary requirements on your registration form.

Day 1

8:30 - 9:00          Registration
9:00 - 10:30        Lecture 1
10:30 - 11:00      Morning Tea
11:00 - 12:30      Practical 1
12:30 - 1:30        Lunch
1:30 - 3:00          Lecture 2
3:00 - 3:30          Afternoon Tea
3:30 - 5:00          Practical 2

Day 2


9:00 - 10:30        Lecture 3
10:30 - 11:00      Morning Tea
11:00 - 12:30      Practical 3
12:30 - 1:30        Lunch
1:30 - 3:00          Lecture 4
3:00 - 3:30          Afternoon Tea
3:30 - 5:00          Practical 4

Day 3


9:00 - 10:30        Lecture 5
10:30 - 11:00      Morning Tea
11:00 - 12:30      Practical 5
12:30 - 1:30        Lunch
1:30 - 3:00          Lecture 6
3:00 - 3:30          Afternoon Tea
3:30 - 5:00          Practical 6

Teaching Style

This workshop uses a combination of three teaching styles:

  • Lectures and classroom discussions

  • Small group discussions

  • Computer exercises

During the lecture sessions the theory of statistics will be presented, and will be discussed in an interactive manner with the class.

Small Group Discussions

During one of the practicals in this workshop, we will read through a number of application papers from a range of fields (including medicine, education, business, and environmental sciences). We will explore what research question is being asked in the paper, the choice of statistical methods used, and an explanation of the results obtained and their interpretation.

Computer Exercises

Each workshop will involve the use of laptop computers. For these sessions participants will be asked to bring their own laptops with R installed. For this workshop, participants will be using R during individual hands-on exercises throughout the workshop.

Please note that a copy of R will be given to all participants at the start of the workshop via USB. If any participants do not have a USB port on their laptops, it will be the responsibility of those participants to download the software prior to the workshop. Participants may also wish to download a copy of RStudio, an open-source code editor for R.

Instructor

Mark Griffin.jpg

Dr Mark Griffin is the Founding Director of Insight Research Services Associated (www.insightrsa.com), and holds academic appointments at the University of Queensland and the University of Sydney. Mark is the Chair of the IIBA Business Analytics Special Interest Group and the IIBA Asia-Pacific Regional Director. Mark also serves on the Executive Committee for the Statistical Society of Australia, and is the Chair of their Section for Business Analytics. Mark has previously taught over 80 two-day workshops and 10 five-day workshops in the fields of Business Analytics and Statistics. Major analytics projects that Mark is or has been involved in include:

  • Mark leads a research group at the University of Queensland conducting analysis of incident reports collated by the Queensland Ambulance Service. The QAS visits approximately 700,000 incidents per year where QAS staff complete a report detailing each incident. This project uses R for text analytics, market segmentation, and spatial mapping (GIS) (2017 to present).

  • Mark is leading a research group at the University of Queensland that are creating an online sample size calculator in R. This software will be used by managers of medical trials who wish to know how many patients to enrol in their trials. This work is being conducted in partnership with research collaborators at Harvard University. This project uses R for developing a web interface and for the mathematical equations involved (2017 to present).

  • Mark has developed software in R for SeqWater (where SeqWater monitors the water quality of all 28 water reservoirs in South-East Queensland). This project uses R for developing a web interface and for statistical analysis using time-series data (2017).

  • Mark led a project team evaluating the delivery of the Positive Parenting Program for the Queensland Department of Communities, Child Safety and Disability Services. This included the collection and analysis of data from 140,000 parents and 1000 practitioners (psychologists) involved in the program. This project used R for statistical analysis and data visualization (2016-2017).