17  Working with Dates and Times

18 Working with Dates and Times

Code
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'stringr' was built under R version 4.5.2
Warning: package 'forcats' was built under R version 4.5.2
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

18.1 Dates are deceptively simple

Dates look simple. Everyone knows what “January 15, 2025” means. But date arithmetic surfaces problems quickly: months have different lengths, years sometimes have a 29th of February, time zones drift across daylight saving boundaries, and “the same date” in two different countries can refer to two different days because of formatting conventions. Most of the date bugs that ship to production trace to one of these.

R has built-in tools for dates, but they require you to memorize format strings like "%m/%d/%Y" (and yes, the capital Y matters). The lubridate package takes a different approach: just tell it what order the year, month, and day appear in, and it figures out the rest.

One note: as of tidyverse 2.0 (2023), lubridate is loaded automatically with library(tidyverse). Older code may still call library(lubridate) separately — both work; the explicit call is just redundant.

WarningAI Pitfall: date format ambiguity is the most expensive R bug there is

The single canonical AI-and-dates failure mode: source data uses one date format, AI assumes another, and the parse silently produces wrong (but plausible) dates.

A concrete example. Source data column has values like "03/04/2026". In US convention this is March 4, 2026. In most of the rest of the world it is April 3, 2026. If the AI uses mdy() and the data is European, every date is silently shifted by months — no errors, no warnings, just wrong results that propagate through every downstream filter and aggregation.

This exact bug, in exactly this form, is the source story in the preface to this book. It is not theoretical. It happens regularly.

The defensive habit:

  1. Before parsing dates, look at the raw strings. head(df$date_column, 20) and observe the format. If you see any value where the day is greater than 12, the format is unambiguous; if not, you have to know the source.
  2. After parsing, verify: range(parsed_dates) and confirm the min and max make sense for the dataset. If you have transactions from 2024 but the parsed dates show 2024-2042, something parsed wrong.
  3. For ambiguous data, ask the source. Email someone if you have to. A wrong assumption about date format will silently corrupt every analysis built on top of it.

When AI gives you a one-liner like df %>% mutate(date = mdy(date)), do not trust the parse without checking it.

18.2 Parsing Dates: ymd(), mdy(), dmy()

This is the single best thing about lubridate. Instead of memorizing format codes, you just pick the function that matches the order of your date components.

Year-Month-Day? Use ymd():

Code
ymd("2025-06-15")
[1] "2025-06-15"
Code
ymd("2025/06/15")
[1] "2025-06-15"
Code
ymd("20250615")
[1] "2025-06-15"

Month-Day-Year (the American way)? Use mdy():

Code
mdy("06-15-2025")
[1] "2025-06-15"
Code
mdy("June 15, 2025")
[1] "2025-06-15"
Code
mdy("Jun 15 2025")
[1] "2025-06-15"

Day-Month-Year (the rest-of-the-world way)? Use dmy():

Code
dmy("15-06-2025")
[1] "2025-06-15"
Code
dmy("15 June 2025")
[1] "2025-06-15"
Code
dmy("15/06/2025")
[1] "2025-06-15"

Notice that each function handles slashes, dashes, spaces, full month names, and abbreviations. You just tell it the order and lubridate does the rest. Compare that with base R’s as.Date("06/15/2025", format = "%m/%d/%Y") and you’ll never look back.

These functions are vectorized, so they work on entire columns:

Code
date_strings <- c("January 1, 2025", "February 14, 2025", "July 4, 2025", "December 25, 2025")
mdy(date_strings)
[1] "2025-01-01" "2025-02-14" "2025-07-04" "2025-12-25"

18.3 Adding Time: ymd_hms() and Friends

Need date AND time? Just add _hms (hours, minutes, seconds) to the function name:

Code
ymd_hms("2025-06-15 14:30:45")
[1] "2025-06-15 14:30:45 UTC"
Code
mdy_hm("June 15, 2025 2:30 PM")
[1] "2025-06-15 14:30:00 UTC"

You can also specify a time zone:

Code
ymd_hms("2025-06-15 14:30:45", tz = "America/New_York")
[1] "2025-06-15 14:30:45 EDT"

18.4 Building Dates from Separate Columns

Sometimes your data has year, month, and day in separate columns (looking at you, Excel exports). make_date() puts them together:

Code
event_data <- tibble(
 event = c("Conference", "Workshop", "Seminar", "Retreat"),
 yr = c(2025, 2025, 2025, 2026),
 mo = c(3, 6, 9, 1),
 dy = c(10, 22, 5, 15)
)

event_data %>%
 mutate(event_date = make_date(yr, mo, dy))
# A tibble: 4 × 5
  event         yr    mo    dy event_date
  <chr>      <dbl> <dbl> <dbl> <date>    
1 Conference  2025     3    10 2025-03-10
2 Workshop    2025     6    22 2025-06-22
3 Seminar     2025     9     5 2025-09-05
4 Retreat     2026     1    15 2026-01-15

18.5 Today and Now

Need the current date or time? Easy:

Code
[1] "2026-05-06"
Code
now()
[1] "2026-05-06 14:17:01 PDT"

These are surprisingly useful for things like calculating how many days until a deadline or filtering data to “last 30 days.”

18.6 Extracting Components: Year, Month, Day, etc.

Once you have a proper date object, you can pull it apart into pieces. This is where dates become really useful for business analysis — think “sales by month” or “orders by day of week.”

Code
dt <- ymd_hms("2025-06-15 14:30:45")

year(dt)
[1] 2025
Code
month(dt)
[1] 6
Code
mday(dt) # day of the month
[1] 15
Code
wday(dt) # day of the week (1 = Sunday by default)
[1] 1
Code
hour(dt)
[1] 14
Code
[1] 2

Want month names instead of numbers? Set label = TRUE:

Code
month(dt, label = TRUE)
[1] Jun
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
Code
month(dt, label = TRUE, abbr = FALSE)
[1] June
12 Levels: January < February < March < April < May < June < ... < December

Same for day of the week:

Code
wday(dt, label = TRUE)
[1] Sun
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
Code
wday(dt, label = TRUE, abbr = FALSE)
[1] Sunday
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Here’s how this looks in a real workflow — breaking a transaction date into analysis-friendly components:

Code
sales_data <- tibble(
 transaction_date = ymd_hms(c(
 "2025-01-10 09:15:00", "2025-02-14 11:30:00",
 "2025-03-22 16:45:00", "2025-04-01 08:00:00",
 "2025-05-18 13:20:00", "2025-06-30 17:55:00"
 )),
 amount = c(150, 280, 95, 420, 175, 310)
)

sales_data %>%
 mutate(
 sale_month = month(transaction_date, label = TRUE),
 sale_day = wday(transaction_date, label = TRUE),
 sale_quarter = quarter(transaction_date)
 )
# A tibble: 6 × 5
  transaction_date    amount sale_month sale_day sale_quarter
  <dttm>               <dbl> <ord>      <ord>           <int>
1 2025-01-10 09:15:00    150 Jan        Fri                 1
2 2025-02-14 11:30:00    280 Feb        Fri                 1
3 2025-03-22 16:45:00     95 Mar        Sat                 1
4 2025-04-01 08:00:00    420 Apr        Tue                 2
5 2025-05-18 13:20:00    175 May        Sun                 2
6 2025-06-30 17:55:00    310 Jun        Mon                 2

Now you can group by month, day of week, or quarter. That’s how you answer questions like “Do we sell more on weekdays or weekends?” or “Which quarter had the highest revenue?”

18.7 Date Math: Adding and Subtracting Time

This is one of the best parts. You can just add time to dates. Lubridate gives you two ways to think about it:

Durations — exact amounts of time measured in seconds:

Code
ymd("2025-01-01") + ddays(30)
[1] "2025-01-31"
Code
ymd("2025-01-01") + dweeks(2)
[1] "2025-01-15"
Code
ymd_hms("2025-06-15 10:00:00") + dhours(3)
[1] "2025-06-15 13:00:00 UTC"

Periods — calendar-aware amounts (respects different month lengths):

Code
ymd("2025-01-01") + months(6)
[1] "2025-07-01"
Code
ymd("2025-01-31") + months(1) # February doesn't have 31 days, so you get NA
[1] NA
Code
ymd("2025-03-15") + years(2) + months(3) + days(10)
[1] "2027-06-25"

When to use which? Use durations when you need an exact elapsed time (“How many seconds did this process take?”). Use periods when you need calendar math (“What’s the date 3 months from now?”). For most business analysis, periods are what you want.

You can also just subtract dates to get the difference:

Code
as.numeric(ymd("2025-12-31") - ymd("2025-01-01"))
[1] 364

That’s the number of days between two dates. Simple.

18.8 Intervals: Did This Event Happen During This Period?

An interval is just a span of time with a start and end. The main use case is checking whether a date falls within a range.

Code
spring_semester <- ymd("2025-01-13") %--% ymd("2025-05-09")

ymd("2025-03-15") %within% spring_semester
[1] TRUE
Code
ymd("2025-06-01") %within% spring_semester
[1] FALSE

Useful for questions like “Did this sale happen during our Q2 promotion?” or “Was this employee hired during the pandemic?”

18.9 Rounding Dates: Grouping by Time Period

floor_date() rounds a date down to the nearest unit. This is incredibly useful for aggregating data — turning daily transactions into monthly summaries, for example.

Code
transactions <- tibble(
 date = ymd(c(
 "2025-01-05", "2025-01-18", "2025-01-29",
 "2025-02-03", "2025-02-14", "2025-02-27",
 "2025-03-08", "2025-03-15", "2025-03-30"
 )),
 revenue = c(1200, 950, 1100, 1350, 800, 1500, 1050, 1200, 900)
)

transactions %>%
 mutate(month = floor_date(date, unit = "month")) %>%
 group_by(month) %>%
 summarize(
 total_revenue = sum(revenue),
 n_transactions = n()
 )
# A tibble: 3 × 3
  month      total_revenue n_transactions
  <date>             <dbl>          <int>
1 2025-01-01          3250              3
2 2025-02-01          3650              3
3 2025-03-01          3150              3

The output shows three rows — one per month — with total revenue and transaction count. January had the highest total revenue ($3,250 from 3 transactions), while February had the highest single-transaction count of 3 as well but a different total. This floor_date() + group_by() pattern is incredibly common in business reporting — it turns granular daily data into the monthly summaries that dashboards and management reports are built on.

You can also round to weeks, quarters, or even custom intervals like “15 minutes” or “6 hours” if you’re working with high-frequency data.

18.10 Time Zones: The One Paragraph Version

Time zones are complicated, but here’s what you need to know. with_tz() converts a time for display (“What time is this New York meeting in Tokyo?”) without changing the actual moment. force_tz() corrects a mislabeled time zone (“This was recorded as UTC but it was actually Eastern”). If you’re working with data from a single time zone, you probably don’t need either of these.

Code
meeting_time <- ymd_hms("2025-06-15 10:00:00", tz = "America/New_York")
with_tz(meeting_time, tzone = "Asia/Tokyo") # same moment, displayed in Tokyo time
[1] "2025-06-15 23:00:00 JST"

18.12 Quick Reference

What you want to do Function
Parse a date string ymd(), mdy(), dmy() (match the order)
Parse date + time ymd_hms(), mdy_hm(), etc.
Build from components make_date(), make_datetime()
Current date/time today(), now()
Extract year, month, etc. year(), month(), mday(), wday(), quarter()
Add calendar time + months(), + years(), + days()
Add exact time + dhours(), + ddays(), + dweeks()
Round to period floor_date(), ceiling_date(), round_date()
Check if date is in range %within% with intervals (%--%)

Dates are weird in every language. But with lubridate, they’re manageable. Parse them, extract what you need, do the math, and get on with your analysis.