23 Home Exercises

Instructions

The at-home exercises should be completed using Posit cloud, Home Exercise 5. Create a new .Rmd file (use File -> New File -> R Notebook.

I advise saving the new file straight away. Because you’ll submit this file as an assignment, use a standardised name: lastname_firstname5. Regularly re-save the file.

When you have finished the exercise (or part of it), ‘knit’ the file. Export the .Rmd and the html as a .zip file, and upload this to the assignment area.

Task

Load the Bellevue Almshouse dataset we used a few weeks ago, which can be found in this project files tab. Load the Tidyverse library

Create a series of charts using ggplot2, visualising the following aspects of the data. In most cases, you’ll need to prepare the data by filtering and/or summarising. Create each one in a separate code cell.

The number of male and female individuals. First filter out any missing or ambiguous data.
The ten most frequent diseases (look back to previous weeks for this one)
The ten most frequent female first names.
The distribution of the values for age, for male and female gender categories.
Create a line chart visualising the number of individuals admitted per week.

Tip

The function floor_week(unit = 'week') will take a full date and round it down to the start of the week (starting on Sunday, because it’s American). So for example, running floor_week on today’s date and tomorrow’s date (2024-10-07 and 2024-10-08) would give 2024-10-06 for both.

Try creating a new column with the ‘floored’ dates, and then summarise using these as groups.

Do the same except using months instead of weeks, and visualise male and females separately.
Use a similar approach to make a scatterplot, but set the colors of the points to gender

Final challenge:

You’ve been asked to visualise the professions in the dataset, as a bar chart. Let’s take a look at a summary of this data first (this is not live code):

library(dplyr)
library(readr)

bellevue_dataset = read_csv('bellevue_almshouse_modified.csv')

bellevue_dataset |>
  group_by(profession) |>
  summarise(n = n()) |>
  arrange(desc(n))

As you can see, there are far too many professions listed to visualise as individual bars. How would you solve this? Try to think of a solution to this - you could use another visualisation method, or re-organise/rename the data to have a pragmatic solution.

If you can’t code the solution, you can simply write a description of what you would like to do, and we can try to solve it in the class!