21 AI-Assisted R Programming

22 AI-Assisted R Programming

22.1 The Pair Programmer That Never Sleeps

In 2023, a data analyst at a mid-size consulting firm was asked to clean and visualize a client’s sales data. She had been learning R for about six months. The dataset had 14 columns, inconsistent date formats, and customer names split across three fields. She estimated the work would take her most of a day.

She opened Claude, described the dataset structure, and asked for a tidyverse pipeline to clean and reshape the data. Thirty seconds later, she had working code. She ran it. It produced output. She sent the visualization to her manager.

The next morning, the client called. The chart showed Q4 revenue doubling Q3. That was wrong. The date parsing code had silently converted European-format dates (day/month/year) to American format (month/day/year), shifting several months of transactions into incorrect quarters. The code ran without errors. The output looked plausible. Nothing flagged the problem.

The analyst had used AI as an answer machine. She needed to use it as a drafting tool.

This chapter covers how to use large language models effectively as R programming partners. The core principle is simple: AI is good at generating code, bad at knowing whether the code is appropriate. Your job is the second part.

22.2 Business Application

Working analysts use AI-assisted coding daily. The difference between productive use and dangerous use is verification. If you manage a team that uses R, you need people who can read AI-generated code critically, not just people who can prompt for it. This chapter builds that skill.

22.3 What AI Can Do Well

Large language models are effective at several R programming tasks.

Translating intent into syntax. If you know what you want to do but cannot remember the function name or argument order, an LLM can bridge the gap. “Group this dataframe by region, compute the mean and standard deviation of revenue, and arrange by descending mean” becomes a clean dplyr pipeline in seconds.

Explaining existing code. Paste a code block you inherited or found online and ask for a line-by-line explanation. This is one of the most reliably useful applications. The explanation is almost always accurate for standard tidyverse code.

Debugging error messages. R error messages are famously unhelpful. Paste the error, the code that produced it, and a sample of your data. The LLM will usually identify the problem faster than you can search Stack Overflow.

Generating boilerplate. ggplot2 themes, R Markdown YAML headers, function documentation, package skeleton files. Anything repetitive and well-documented is a good candidate.

Converting between frameworks. Base R to tidyverse. Wide to long. ggplot2 to plotly. These are translation tasks, and LLMs handle them well when the source code is clear.

22.4 What AI Gets Wrong

The failure modes are predictable enough that you can watch for them.

Hallucinated packages and functions. LLMs sometimes invent package names or function arguments that do not exist. If you see a package name you have never heard of, check CRAN before installing it. If a function name seems unfamiliar, run ?function_name before trusting it.

Version conflicts. LLMs train on code from many time periods. They may generate code using syntax that was valid in an older version of a package but has since been deprecated. The tidyverse evolves. gather() and spread() were replaced by pivot_longer() and pivot_wider(). LLMs still generate the old versions.

Confidently wrong analysis choices. An LLM will run a paired t-test on independent samples if you describe the data ambiguously. It will use cor() on ordinal data without comment. It will fit a linear model to data with severe heteroscedasticity and report the coefficients as if they were trustworthy. The code runs. The output looks professional. The analysis is wrong.

Fabricated data characteristics. If you describe your data vaguely, the LLM will fill in assumptions. It may assume your data is normally distributed, that your sample is random, or that your variables are independent. It will not tell you it made these assumptions. You have to check.

Incomplete error handling. AI-generated code often handles the happy path well but breaks on edge cases. Missing values, empty groups, unexpected factor levels, and encoding issues are common sources of silent failure.

22.5 AI Reality Check

In a 2024 study, researchers asked GPT-4 to write R code for 50 standard data analysis tasks. The code ran without errors in 43 of 50 cases. But in 11 of those 43 cases, the code produced incorrect or misleading results. The most common failure: choosing an inappropriate statistical method for the data structure described. The code was syntactically correct and analytically wrong. This is the failure mode that matters most.

22.6 Prompting Strategies That Work

The quality of AI-generated R code depends heavily on how you describe your task.

Specify the framework. Say “using dplyr and tidyr” or “using base R only.” Without this, you get unpredictable mixes of styles.

Describe the data structure. Include column names, data types, and a few example rows. The more specific you are, the more accurate the code.

Ask for step-by-step explanation. Instead of “write code to do X,” try “write code to do X and explain each step.” This forces the LLM to articulate its reasoning, making errors easier to spot.

Request assumption checks. “Before running this test, check whether the assumptions are met.” This prompts the LLM to include diagnostic code it would otherwise skip.

Iterate, do not accept. Treat the first response as a draft. Read the code. Run it on a small sample. Check the output against what you expect. Then ask for modifications.

22.7 The Verification Layer

Every piece of AI-generated code needs human verification before it informs a decision. Here is a practical checklist.

Read the code before running it. Can you explain what each line does? If not, ask the LLM to explain it, then verify the explanation.
Check package names against CRAN. Run available.packages() or search cran.r-project.org for any unfamiliar package.
Run on a small, known dataset first. Use a subset where you can verify the output by hand. If the code produces the right answer on five rows, it is more likely to be right on five thousand.
Inspect intermediate results. Do not just look at the final output. Print the dataframe after each transformation step. Are the dimensions right? Are the values plausible?
Check statistical appropriateness. Is this the right test for your data? Are the assumptions met? Does the sample size support the analysis? These are judgment calls that AI cannot make for you.
Test edge cases. What happens with missing values? Empty groups? A single observation? If the code breaks silently, you have a problem.

22.8 Reproducibility and Attribution

AI-assisted analysis introduces new questions about reproducibility and credit.

Document your process. Note which parts of your analysis were AI-assisted and which were written from scratch. This is not a confession. It is good practice.

Save your prompts. The prompt is part of your methodology. If someone wants to reproduce your analysis, they need to know what you asked the AI to do and how you modified the output.

Version your code, not the AI output. The final, verified code goes into your script or R Markdown file. The raw AI output is a draft, not a deliverable.

Cite the tool. If you used Claude, ChatGPT, or Copilot to generate substantial portions of your analysis code, say so. The norms are still forming, but transparency is always the right default.

22.9 Practical Workflows

Here are three workflows that work well in practice.

Workflow 1: Scaffolding. You know the analysis plan. You use AI to generate the code skeleton. You modify, verify, and complete it. This is the most common productive pattern.

Workflow 2: Debugging partner. You wrote the code. It does not work. You paste the error and your code into an LLM and ask for help. The LLM spots the typo, the missing argument, or the type mismatch. You fix it.

Workflow 3: Learning accelerator. You encounter a new package or technique. You ask the LLM to generate an example, explain it, and suggest exercises. You work through them. The LLM is a tutor, not a replacement for learning.

The workflow that does not work: paste a vague description of your data and your goal, accept the first code block, run it, and report the results. That is not analysis. That is outsourcing your judgment to a system that has none.

22.10 Ethics Moment

If you submit an analysis where AI wrote the code and you did not verify it, who is responsible for errors in the results? You are. The tool does not have accountability. You do. What does that imply about how much verification is “enough” before you present findings to a client or a manager?

22.11 What AI Cannot Replace

AI can write a dplyr pipeline. It cannot tell you whether the pipeline answers the right question. It can compute a correlation. It cannot tell you whether the correlation is meaningful in your business context. It can generate a chart. It cannot tell you whether the chart tells a true story.

The skills that matter most are the ones AI handles worst:

Deciding what question to ask
Judging whether the data can answer it
Choosing the right analytical approach
Interpreting results in context
Communicating findings honestly
Recognizing when an analysis is misleading

These are the skills this book has been building across every chapter. They do not become less important because AI can write code. They become more important, because the cost of running a bad analysis has dropped to nearly zero.

Key Terms

Hallucinated package: A package name generated by an LLM that does not exist on CRAN
Prompt engineering: The practice of writing specific, structured requests to get better output from an LLM
Verification layer: The human step between receiving AI-generated code and using it for decisions
Reproducibility: The ability for another person to run your code and get the same results
Attribution: Documenting which parts of an analysis were AI-assisted

Exercises

22.11.1 Check Your Understanding

Name three tasks where AI-assisted R coding is reliably useful.
What is a “hallucinated package” and how do you check for one?
Why might AI-generated code run without errors but still produce incorrect results?
What information should you include in a prompt to get better R code from an LLM?
What does it mean to “version your code, not the AI output”?
Name two R functions that were deprecated in the tidyverse and might still appear in AI-generated code.
Why is specifying “using dplyr” or “using base R” important when prompting for code?
What is the verification layer and why does it matter?

22.11.2 Apply It

Use an LLM to generate a dplyr pipeline that reads a CSV file, filters rows where a numeric column exceeds its median, groups by a categorical column, and computes the mean of the numeric column. Run the code on a dataset from this book. Verify each step by printing intermediate results.
Ask an LLM to write code for a two-sample t-test comparing two groups. Then ask it to include assumption checks (normality, equal variances). Compare the two versions. What did the first version skip?
Generate a ggplot2 visualization using an LLM. Ask for a version with proper axis labels, a descriptive title, and a colorblind-safe palette. Evaluate whether the output meets the visualization standards from this book.
Take a piece of R code from an earlier chapter in this book. Paste it into an LLM and ask for a line-by-line explanation. Evaluate whether the explanation is accurate.
Ask an LLM to convert a base R analysis (using aggregate() and plot()) to tidyverse equivalents (using group_by(), summarize(), and ggplot()). Compare the outputs.
Prompt an LLM to “analyze this dataset” with minimal description. Document every assumption the LLM makes. Then re-prompt with specific details. Compare the quality of the two outputs.
Generate code for a linear regression using an LLM. Ask it to include residual diagnostics. Run the diagnostics on actual data and interpret whether the model assumptions are met.
Use an LLM to debug a deliberately broken R script. Introduce three common errors (a typo in a function name, a missing closing parenthesis, and an incorrect column name). See if the LLM identifies all three.

22.11.3 Think Deeper

A junior analyst presents a report where all the R code was generated by an LLM. The results look correct. The analyst cannot explain what the code does line by line. Should the report be accepted? What risks does this create?
Some argue that AI-assisted coding makes learning R unnecessary. Others argue it makes learning R more important. Construct the strongest version of each argument, then explain which you find more persuasive and why.
A consulting firm adopts a policy that all client-facing analyses must include a disclosure statement indicating whether AI was used in the coding process. What are the benefits and risks of this policy? Would you recommend it?
Consider the date-parsing error from the opening story. Design a verification protocol that would have caught this error. How much additional time would the protocol require? Is the tradeoff worthwhile?

--- title: "AI-Assisted R Programming" --- # AI-Assisted R Programming ## The Pair Programmer That Never Sleeps In 2023, a data analyst at a mid-size consulting firm was asked to clean and visualize a client's sales data. She had been learning R for about six months. The dataset had 14 columns, inconsistent date formats, and customer names split across three fields. She estimated the work would take her most of a day. She opened Claude, described the dataset structure, and asked for a tidyverse pipeline to clean and reshape the data. Thirty seconds later, she had working code. She ran it. It produced output. She sent the visualization to her manager. The next morning, the client called. The chart showed Q4 revenue doubling Q3. That was wrong. The date parsing code had silently converted European-format dates (day/month/year) to American format (month/day/year), shifting several months of transactions into incorrect quarters. The code ran without errors. The output looked plausible. Nothing flagged the problem. The analyst had used AI as an answer machine. She needed to use it as a drafting tool. This chapter covers how to use large language models effectively as R programming partners. The core principle is simple: AI is good at generating code, bad at knowing whether the code is appropriate. Your job is the second part. ::: {.callout-business-application} ## Business Application Working analysts use AI-assisted coding daily. The difference between productive use and dangerous use is verification. If you manage a team that uses R, you need people who can read AI-generated code critically, not just people who can prompt for it. This chapter builds that skill. ::: ## What AI Can Do Well Large language models are effective at several R programming tasks. **Translating intent into syntax.** If you know what you want to do but cannot remember the function name or argument order, an LLM can bridge the gap. "Group this dataframe by region, compute the mean and standard deviation of revenue, and arrange by descending mean" becomes a clean dplyr pipeline in seconds. **Explaining existing code.** Paste a code block you inherited or found online and ask for a line-by-line explanation. This is one of the most reliably useful applications. The explanation is almost always accurate for standard tidyverse code. **Debugging error messages.** R error messages are famously unhelpful. Paste the error, the code that produced it, and a sample of your data. The LLM will usually identify the problem faster than you can search Stack Overflow. **Generating boilerplate.** ggplot2 themes, R Markdown YAML headers, function documentation, package skeleton files. Anything repetitive and well-documented is a good candidate. **Converting between frameworks.** Base R to tidyverse. Wide to long. ggplot2 to plotly. These are translation tasks, and LLMs handle them well when the source code is clear. ## What AI Gets Wrong The failure modes are predictable enough that you can watch for them. **Hallucinated packages and functions.** LLMs sometimes invent package names or function arguments that do not exist. If you see a package name you have never heard of, check CRAN before installing it. If a function name seems unfamiliar, run `?function_name` before trusting it. **Version conflicts.** LLMs train on code from many time periods. They may generate code using syntax that was valid in an older version of a package but has since been deprecated. The tidyverse evolves. `gather()` and `spread()` were replaced by `pivot_longer()` and `pivot_wider()`. LLMs still generate the old versions. **Confidently wrong analysis choices.** An LLM will run a paired t-test on independent samples if you describe the data ambiguously. It will use `cor()` on ordinal data without comment. It will fit a linear model to data with severe heteroscedasticity and report the coefficients as if they were trustworthy. The code runs. The output looks professional. The analysis is wrong. **Fabricated data characteristics.** If you describe your data vaguely, the LLM will fill in assumptions. It may assume your data is normally distributed, that your sample is random, or that your variables are independent. It will not tell you it made these assumptions. You have to check. **Incomplete error handling.** AI-generated code often handles the happy path well but breaks on edge cases. Missing values, empty groups, unexpected factor levels, and encoding issues are common sources of silent failure. ::: {.callout-ai-reality-check} ## AI Reality Check In a 2024 study, researchers asked GPT-4 to write R code for 50 standard data analysis tasks. The code ran without errors in 43 of 50 cases. But in 11 of those 43 cases, the code produced incorrect or misleading results. The most common failure: choosing an inappropriate statistical method for the data structure described. The code was syntactically correct and analytically wrong. This is the failure mode that matters most. ::: ## Prompting Strategies That Work The quality of AI-generated R code depends heavily on how you describe your task. **Specify the framework.** Say "using dplyr and tidyr" or "using base R only." Without this, you get unpredictable mixes of styles. **Describe the data structure.** Include column names, data types, and a few example rows. The more specific you are, the more accurate the code. **Ask for step-by-step explanation.** Instead of "write code to do X," try "write code to do X and explain each step." This forces the LLM to articulate its reasoning, making errors easier to spot. **Request assumption checks.** "Before running this test, check whether the assumptions are met." This prompts the LLM to include diagnostic code it would otherwise skip. **Iterate, do not accept.** Treat the first response as a draft. Read the code. Run it on a small sample. Check the output against what you expect. Then ask for modifications. ## The Verification Layer Every piece of AI-generated code needs human verification before it informs a decision. Here is a practical checklist. 1. **Read the code before running it.** Can you explain what each line does? If not, ask the LLM to explain it, then verify the explanation. 2. **Check package names against CRAN.** Run `available.packages()` or search [cran.r-project.org](https://cran.r-project.org) for any unfamiliar package. 3. **Run on a small, known dataset first.** Use a subset where you can verify the output by hand. If the code produces the right answer on five rows, it is more likely to be right on five thousand. 4. **Inspect intermediate results.** Do not just look at the final output. Print the dataframe after each transformation step. Are the dimensions right? Are the values plausible? 5. **Check statistical appropriateness.** Is this the right test for your data? Are the assumptions met? Does the sample size support the analysis? These are judgment calls that AI cannot make for you. 6. **Test edge cases.** What happens with missing values? Empty groups? A single observation? If the code breaks silently, you have a problem. ## Reproducibility and Attribution AI-assisted analysis introduces new questions about reproducibility and credit. **Document your process.** Note which parts of your analysis were AI-assisted and which were written from scratch. This is not a confession. It is good practice. **Save your prompts.** The prompt is part of your methodology. If someone wants to reproduce your analysis, they need to know what you asked the AI to do and how you modified the output. **Version your code, not the AI output.** The final, verified code goes into your script or R Markdown file. The raw AI output is a draft, not a deliverable. **Cite the tool.** If you used Claude, ChatGPT, or Copilot to generate substantial portions of your analysis code, say so. The norms are still forming, but transparency is always the right default. ## Practical Workflows Here are three workflows that work well in practice. **Workflow 1: Scaffolding.** You know the analysis plan. You use AI to generate the code skeleton. You modify, verify, and complete it. This is the most common productive pattern. **Workflow 2: Debugging partner.** You wrote the code. It does not work. You paste the error and your code into an LLM and ask for help. The LLM spots the typo, the missing argument, or the type mismatch. You fix it. **Workflow 3: Learning accelerator.** You encounter a new package or technique. You ask the LLM to generate an example, explain it, and suggest exercises. You work through them. The LLM is a tutor, not a replacement for learning. The workflow that does not work: paste a vague description of your data and your goal, accept the first code block, run it, and report the results. That is not analysis. That is outsourcing your judgment to a system that has none. ::: {.callout-ethics-moment} ## Ethics Moment If you submit an analysis where AI wrote the code and you did not verify it, who is responsible for errors in the results? You are. The tool does not have accountability. You do. What does that imply about how much verification is "enough" before you present findings to a client or a manager? ::: ## What AI Cannot Replace AI can write a dplyr pipeline. It cannot tell you whether the pipeline answers the right question. It can compute a correlation. It cannot tell you whether the correlation is meaningful in your business context. It can generate a chart. It cannot tell you whether the chart tells a true story. The skills that matter most are the ones AI handles worst: - Deciding what question to ask - Judging whether the data can answer it - Choosing the right analytical approach - Interpreting results in context - Communicating findings honestly - Recognizing when an analysis is misleading These are the skills this book has been building across every chapter. They do not become less important because AI can write code. They become more important, because the cost of running a bad analysis has dropped to nearly zero. ## Key Terms {.unnumbered} - **Hallucinated package**: A package name generated by an LLM that does not exist on CRAN - **Prompt engineering**: The practice of writing specific, structured requests to get better output from an LLM - **Verification layer**: The human step between receiving AI-generated code and using it for decisions - **Reproducibility**: The ability for another person to run your code and get the same results - **Attribution**: Documenting which parts of an analysis were AI-assisted ## Exercises {.unnumbered} ### Check Your Understanding {.exercises-check} 1. Name three tasks where AI-assisted R coding is reliably useful. 2. What is a "hallucinated package" and how do you check for one? 3. Why might AI-generated code run without errors but still produce incorrect results? 4. What information should you include in a prompt to get better R code from an LLM? 5. What does it mean to "version your code, not the AI output"? 6. Name two R functions that were deprecated in the tidyverse and might still appear in AI-generated code. 7. Why is specifying "using dplyr" or "using base R" important when prompting for code? 8. What is the verification layer and why does it matter? ### Apply It {.exercises-apply} 1. Use an LLM to generate a dplyr pipeline that reads a CSV file, filters rows where a numeric column exceeds its median, groups by a categorical column, and computes the mean of the numeric column. Run the code on a dataset from this book. Verify each step by printing intermediate results. 2. Ask an LLM to write code for a two-sample t-test comparing two groups. Then ask it to include assumption checks (normality, equal variances). Compare the two versions. What did the first version skip? 3. Generate a ggplot2 visualization using an LLM. Ask for a version with proper axis labels, a descriptive title, and a colorblind-safe palette. Evaluate whether the output meets the visualization standards from this book. 4. Take a piece of R code from an earlier chapter in this book. Paste it into an LLM and ask for a line-by-line explanation. Evaluate whether the explanation is accurate. 5. Ask an LLM to convert a base R analysis (using `aggregate()` and `plot()`) to tidyverse equivalents (using `group_by()`, `summarize()`, and `ggplot()`). Compare the outputs. 6. Prompt an LLM to "analyze this dataset" with minimal description. Document every assumption the LLM makes. Then re-prompt with specific details. Compare the quality of the two outputs. 7. Generate code for a linear regression using an LLM. Ask it to include residual diagnostics. Run the diagnostics on actual data and interpret whether the model assumptions are met. 8. Use an LLM to debug a deliberately broken R script. Introduce three common errors (a typo in a function name, a missing closing parenthesis, and an incorrect column name). See if the LLM identifies all three. ### Think Deeper {.exercises-deeper} 1. A junior analyst presents a report where all the R code was generated by an LLM. The results look correct. The analyst cannot explain what the code does line by line. Should the report be accepted? What risks does this create? 2. Some argue that AI-assisted coding makes learning R unnecessary. Others argue it makes learning R more important. Construct the strongest version of each argument, then explain which you find more persuasive and why. 3. A consulting firm adopts a policy that all client-facing analyses must include a disclosure statement indicating whether AI was used in the coding process. What are the benefits and risks of this policy? Would you recommend it? 4. Consider the date-parsing error from the opening story. Design a verification protocol that would have caught this error. How much additional time would the protocol require? Is the tradeoff worthwhile?