Project 2

Introduction

The focus of my data collection was seeing the duration of the commutes that Armand and I take to and from university. Because we also tend to take different methods of transport, this was also something that was tracked. Each observation was one journey from or to university.

When designing the form we followed guideline 4 by using data validation. The question asking for the duration of the journey used response validation to ensure that only values between 1 and 120 were accepted (120 was chosen as the maximum time because that is about how long it takes to get to Auckland from Hamilton in the morning - which I consider to be the longest journey that still counts as a commute), with no non-number characters accepted. All the questions were made required because all commutes have a start time, method, and duration.

The data collected will allow me to see whether there is a difference between the time that it takes for me and my friend to commute, and because the method is a variable, it can be split into showing me just data from driving or just data from public transport. The data will be useful for me because it can help me have an accurate judge of how long I should account for me to get to uni or to home.

https://forms.gle/ZJcWS5FopjzUCPMS8

Form design and data collection for this project were completed as a group. My group members were myself and Armand Spencer.

Dynamic report

https://austin-540.github.io/stats220/

Creativity

My project demonstrates creativity by including a dynamic reaction image from my cat in the dynamic report. If the mean length of car commutes is faster or slower than bus commutes it will show a different image with a different caption. This used functions from magick from module 1.

My report is also creative because it used rounded values to create a more visually useful chart from rounded data because the original data was hard to interpret.

Learning reflection

From module 2 I learned the important idea that we should validate data as much as possible (while still being accessible) on the front-end side if we want it to be easily worked with. It made it a lot easier to work with the data when I can know that Google Forms will only allow number values to end up in the duration column and won’t allow NA values. I didn’t need to do any data cleanup and I can be a little bit confident that the dynamic page hopefully won’t break because someone put a bad value in.

I am curious about seeing more of how we can use different types of data manipulation to create more useful data visualisations and insights. I suspect module 3’s filter() function will be very useful for creating subsets of data, which can have more interesting values than just the overall mean, min, and max.

I learned that it is important to ensure that you start group work well ahead of the deadline so that you have time to make any changes that may come from different ideas.

This document’s CSS

@import url('https://fonts.googleapis.com/css2?family=Cal+Sans&family=Micro+5&display=swap');

body {background: #001220;}
#header {background: linear-gradient(90deg,rgba(131, 58, 180, 1) 0%, rgba(255, 0, 68, 1) 50%, rgba(252, 176, 69, 1) 100%);color: white;}
.author {font-family: "Micro 5", sans-serif;}
h1, h2, h3 {font-family: "Cal Sans", sans-serif;}

div:not(.main-container) {
  background: rgba(255, 255, 255, 0.8);
  border-radius: 10px; 
  padding: 10px;
  margin: 10px;
}

/* I wish my other courses let me customise my assignments this much */

Appendix

#
# I have split this document into sections so that I'm not repeating a bunch of code and making
# things confusing with my comments at the bottom. The section number is in the comment *after* the code.
#

library(tidyverse)

logged_data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRetKKj9bMRzvsbYOusSWE0uw2oEJIxmDjFP6C2U79SCUlN05jlOH4OHHo5kVT4mpo5BdYi9q9NZh7w/pub?gid=49993136&single=true&output=csv")

latest_data <- rename(logged_data, timestamp = 1, method = 2, duration = 3, start_time = 4)
# ^^^^^ section 1



#View(latest_data)

print(str_glue("There have been {nrow(latest_data)} journeys tracked by this form.")) #Useful
print(str_glue("Min: {min(latest_data$duration)}, Max: {max(latest_data$duration)}, Mean: {mean(latest_data$duration)}")) #The mean is the most useful here

ggplot(latest_data) + geom_bar(aes(x=duration))
#Not very useful

# ^^^^^ section 2

latest_data_durations.rounded <- latest_data$duration %>% 
  round(-1) #Round the durations to the 10s place
latest_data.rounded <- latest_data
latest_data.rounded$duration <- latest_data_durations.rounded

ggplot(latest_data.rounded) + 
  geom_bar(aes(x=duration, fill = method)) + 
  theme_minimal() +
  labs(x = "Duration in minutes - rounded to the closest 10 minutes",
       y = "Number of journies",
       title = "Durations of people in the data's commutes to and from campus.",
       )
#Thats a lot more useful than before

#^^^^^^ section 3

ggplot(latest_data.rounded) + 
  geom_bar(aes(x=method), fill="#000080") +
  theme_minimal() +
  labs(x = "Transportation method",
       y = "Number of journies",
       title = "Number of journies made to campus by transportation method")
#Kind of interesting to see the ratio, but its not that useful.


#^^^^^^^ section 4

latest_data.bus <- latest_data
latest_data.car <- latest_data
for (i in nrow(latest_data):1) { #I feel like there might be a better way to do this in module 3...
  if (latest_data$method[i] == "Bus") {
    latest_data.car <- latest_data.car[-i,]
  } else if (latest_data$method[i] == "Car") {
    latest_data.bus <- latest_data.bus[-i,]
 } 
} #Starting with the final element, go through each element and remove it from the different dfs if it isn't the corresponding method

mean_times <- data.frame(method=c("Bus", "Car"), duration=c(mean(latest_data.bus$duration), mean(latest_data.car$duration)))
ggplot(mean_times) + 
  geom_col(aes(x=method, y=duration), fill="#36fd45") + #Had to switch the type of geom here - not sure why but the ggplot docs say to
  theme_minimal()
#Honestly right now it doesn't show a big difference but maybe in the future it will. Either way it's useful.

#^^^^^^ section 5



# For my report I will use the rounded data chart and mean duration by method chart
# And I will include the number of rows of data collected, and the mean journey time
# This will require the following R code:

logged_data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRetKKj9bMRzvsbYOusSWE0uw2oEJIxmDjFP6C2U79SCUlN05jlOH4OHHo5kVT4mpo5BdYi9q9NZh7w/pub?gid=49993136&single=true&output=csv")

latest_data <- rename(logged_data, timestamp = 1, method = 2, duration = 3, start_time = 4)

print(str_glue("There have been {nrow(latest_data)} journeys tracked by this form.")) 
print(str_glue("Mean: {mean(latest_data$duration)}"))
# and sections 3 & 5

---
title: My dynamic report
output: html_fragment
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, message=FALSE, warning=FALSE, error=FALSE)

```

```{css}
@import url('https://fonts.googleapis.com/css2?family=Cal+Sans&family=Roboto&display=swap');

h2 {
  font-family: "Cal Sans";
}

* {
  font-family: "Roboto";
}

#how-long-are-armand-and-austins-commutes-to-and-from-university {
  background: rgba(255, 255, 255, 0.7);
  backdrop-filter: blur(15px);
  position: relative;
  width: 90vw;
  padding: 20px;
  border-radius: 20px;
  margin: 40px auto;
}

body {
  background-image: url("https://austin-540.github.io/stats220/docs/blob-scene-haikei3.svg");
  background-size: cover;
  background-color: black;
  background-repeat: no-repeat;
} 

img {
  border: 5px solid black;
  margin: 10px;
}

```

## How long are Armand and Austin's commutes to and from university?

```{r}
library(tidyverse)

logged_data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRetKKj9bMRzvsbYOusSWE0uw2oEJIxmDjFP6C2U79SCUlN05jlOH4OHHo5kVT4mpo5BdYi9q9NZh7w/pub?gid=49993136&single=true&output=csv")

latest_data <- rename(logged_data, timestamp = 1, method = 2, duration = 3, start_time = 4)



latest_data_durations.rounded <- latest_data$duration %>% 
  round(-1) #Round the durations to the 10s place
latest_data.rounded <- latest_data
latest_data.rounded$duration <- latest_data_durations.rounded #then insert the rounded durations into a data frame

ggplot(latest_data.rounded) + 
  geom_bar(aes(x=duration, fill = method)) + 
  theme_minimal() +
  labs(x = "Duration in minutes - rounded to the closest 10 minutes",
       y = "Number of journies",
       title = "Durations of tracked commutes to and from campus.",
  )


latest_data.bus <- latest_data
latest_data.car <- latest_data
for (i in nrow(latest_data):1) { #I'm sure we will cover a much better way of doing this in a future module...
  if (latest_data$method[i] == "Bus") {
    latest_data.car <- latest_data.car[-i,]
  } else if (latest_data$method[i] == "Car") {
    latest_data.bus <- latest_data.bus[-i,]
  } 
}#Starting with the final element, go through each element and remove it from the different dfs if it isn't the corresponding method

mean_times <- data.frame(method=c("Bus", "Car"), duration=c(mean(latest_data.bus$duration), mean(latest_data.car$duration)))
ggplot(mean_times) + 
  geom_col(aes(x=method, y=duration), fill="#16cd25") + #Had to switch the type of geom here
  theme_minimal() +
  labs(x = "Transportation method", y = "Mean duration in minutes",
       title = "Mean commute duration by method")

```

There have been **`r nrow(latest_data)`** commutes timed in the data. The mean commute length recorded is **`r round(mean(latest_data$duration),0)`** minutes.

***


Coco's (dynamic) reaction to this information:

```{r}
library(magick)

happy_coco <- image_read("https://austin-540.github.io/stats220/docs/coco-happy.jpg") %>% 
  image_resize("400x400") %>% #I am aware that one of those values isn't doing anything
  image_annotate(text = "How I feel speeding past the cars \nin the bus lane", size=20, gravity = "north", boxcolor = "#FFFFFF")

surprised_coco <- image_read("https://austin-540.github.io/stats220/docs/coco-surprised.jpg") %>%
  image_resize("400x400") %>%
  image_annotate(text = "My face when the bus \nthat stops at every bus stop is \nslower than a car that doesn't", size=20, gravity = "north", boxcolor = "#FFFFFF")

if (mean(latest_data.bus$duration) > mean(latest_data.car$duration)) { #If the mean bus journey is longer
  surprised_coco #Then show the surprised image
} else {
  happy_coco
} #If the means are exactly equal then it shows the happy Coco image 
#which I think is an acceptable thing to happen
#because I like seeing happy Coco pictures
```