Automating Your Academic CV, Biosketch, and Website with R

CCHMC R Users Group

Cole Brokamp

2022-09-14

👋 Welcome to the CCHMC R Users Group (RUG) Meeting

Why Automate?

  • automating a manual process is what data people do!
  • information in your CV is data and can be extracted for different purposes (CV/resume, biosketch, website)
  • adapt to changing formatting requirements by changing code, not data
  • don’t “lock yourself in” by using a specific online service or closed source software
  • use links on website (or with people that need) to access your biosketch, CV

Existing Packages for R

datadrivencv

vitae

Let’s Build Our Own

CV Automation Workflow

YAML Ain’t Markup Language (YAML)

YAML is a human- and machine-readable data-serialization language for all programming languages.

  • list structure often used for configuration files
  • whitespace indentation – spaces not tabs! – specify the structure
  • YAML reference card
  • R (Studio?) seemingly favors yaml over json

pubs.yaml

pennington-2022:
  title: "Racial Fairness in Precision Medicine: Pediatric Asthma Prediction Algorithms"
  author: Jordan Pennington, Erika Rasnick, Lisa J. Martin, Jocelyn M. Biagini, Tesfaye B. Mersha, Allison Parsons, Gurjit K. Khurana Hershey, Patrick Ryan, Cole Brokamp
  journal: American Journal of Health Promotion
  issue_pages: Online
  year: 2022
  doi: 10.1177/08901171221121639

esteban-2022:
  title: "Understanding Racial Disparities in Childhood Asthma Using Individual- and Neighborhood-Level Risk Factors"
  author: Esteban Correa, Lili Ding, Andrew F. Beck, Cole Brokamp, Mekibib Altayeb, Robert S. Kahn, Tesfay Mersha
  journal: Journal of Allergy and Clinical Immunology
  issue_pages: In Press
  year: 2022
  doi: 10.1016/j.jaci.2022.07.024

brokamp-2022:
  title: "A High Resolution Spatiotemporal Fine Particulate Matter Exposure Assessment Model for the contiguous United States"
  author: Cole Brokamp
  journal: Environmental Advances
  issue_pages: "7:100155"
  year: 2022
  doi: 10.1016/j.envadv.2021.100155

talks.yaml

PAS-2022:
  title: "Decentralized Geomarker Assessment for Multi-Site Studies"
  event: Pediatric Academic Societies Annual Meeting
  year: 2022
  location: Denver, CO

NIH-2022:
  title: "Challenges and Solutions for Private and Reproducible Environmental Exposure Assessment at Scale"
  event: NIH Ethical, Legal, and Social Implications of Gene-Environment Interaction Research Workshop
  year: 2022
  location: Online
  download_link: "https://colebrokamp-website.s3.amazonaws.com/talks/GxE_ELSI_Brokamp.pdf"

Use R to create markdown files

yaml::yaml.load_file("pubs.yaml") |>
  purrr::modify(
    ~ glue::glue(
      .x = .,
      "{author}. {title}. *{journal}*. {issue_pages}. {year}."
    )
  ) |>
  purrr::modify(
    ~ gsub(
      x = .,
      pattern = "Cole Brokamp",
      replacement = "**Cole Brokamp**"
    )
  ) |>
  paste(collapse = "\n\n") |>
  cat(file = "pubs.md")

Use R to create complicated markdown files

https://github.com/cole-brokamp/support/blob/main/parse_support.R

Pandoc

Convert markdown to tex file:

pandoc -o pubs.tex pubs.md

Convert markdown to MS Word file using a reference document:

pandoc pubs.md --reference-doc=reference.dotx -o pubs.docx

CV

Write CV in LaTeX and include tex files in LaTeX documents:

\include{pubs}

R Markdown

Site generation with {rmarkdown}

rmarkdown::render_site()

Hosting on GitHub

Workflow

Makefile

Make

all: my_pubs.docx my_cv.pdf site

pubs.md talks.md: pubs.yaml talks.yaml
        R CMD BATCH parse.R

my_pubs.docx: pubs.md 
        pandoc pubs.md --reference-doc=reference.dotx -o my_pubs.docx

cv.pdf: pubs.tex talks.tex cv.tex
        texfot pdflatex cv.tex

site: pubs.md talks.md cv.pdf
        R -e "rmarkdown::render_site(encoding = 'UTF-8')"
        cp cole-brokamp-cv.pdf docs/cv.pdf
        open docs/index.html

Update Your CV

  1. edit pubs.yaml and talks.yaml
  2. make all
  3. (commit & push)

Applications

R-Centric Alternatives

Extensions

  • use alternative data storage solutions (Google spreadsheets, local CSV)
  • use Google Scholar, ORCID, etc. API to get publications
  • automate citation creation with a PubMed/DOI API
  • hosting and download of published manuscripts
  • GitHub actions to automate makeing and deployment

R as an interface to other tools/languages

  • bottleneck for data science programming often around reading/writing (instead of executing) code
  • R considered as a “user interface”
  • DSL languages in R for computing on code