Introduction to OpenRefine and Regular Expressions

Oklahoma State University & University of Central Oklahoma

Nov 5 & 12, 2025

9:30 am - 12:30 pm

Instructors: Bailey Hoffner, Juliet Alavicheh, Kevin Dyke

Helpers: Risa Jensen-Jones, Karl Siewert, Rebekah Silverstein

OSU Registration Page


UCO Registration Page


General Information

The Carpentries project comprises the Software Carpentry, Data Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.

Want to learn more and stay engaged with The Carpentries? Carpentries Clippings is The Carpentries' biweekly newsletter, where we share community news, community job postings, and more. Sign up to receive future editions and read our full archive: https://carpentries.org/newsletter/

Library Carpentry is made by people working in library- and information-related roles to help you:

Library Carpentry introduces you to the fundamentals of computing and provides you with a platform for further self-directed learning. For more information on what we teach and why, please see our paper "Library Carpentry: software skills training for library professionals".

Who: The course is for people working in library- and information-related roles. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Online (Zoom). Get directions with OpenStreetMap or Google Maps.

When: Nov 5 & 12, 2025; 9:30 am - 12:30 pm Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

We are dedicated to providing a positive and accessible learning environment for all. We do not require participants to provide documentation of disabilities or disclose any unnecessary personal information. However, we do want to help create an inclusive, accessible experience for all participants. We encourage you to share any information that would be helpful to make your Carpentries experience accessible. To request an accommodation for this workshop, please fill out the accommodation request form. If you have questions or need assistance with the accommodation form please email us.

Glosario is a multilingual glossary for computing and data science terms. The glossary helps learners attend workshops and use our lessons to make sense of computational and programming jargon written in English by offering it in their native language. Translating data science terms also provides a teaching tool for Carpentries Instructors to reduce barriers for their learners.

Workshop Recordings: Carpentries workshops are designed to be interactive rather than lecture-based, with lessons that build upon one another. To foster a positive online learning environment, we strongly recommend that participants join in real time. As a result, workshop recordings are not recommended and may not be available to learners.

Contact: Please email danielle.kirsch@okstate.edu for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

Day 2 - November 12th

Lesson 1 Regular Expressions
Lesson 2 Matching and Extracting Strings
Lesson 3 Multiple Choice Quiz
Lesson 4 Exercises

Setup

To participate in a Library Carpentry workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Spreadsheet Software

To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric, OpenOffice.org, or other programs. Commands may differ a bit between programs, but general ideas for thinking about spreadsheets is the same.

For this lesson, if you don't have a spreadsheet program already, you can use [LibreOffice](https://www.libreoffice.org). It is a free, open source spreadsheet program.

Windows

Mac OS

Linux

The Bash Shell

Bash is a commonly-used shell that gives you the power to do simple tasks more quickly. Please find setup instructions in the lesson.

OpenRefine

OpenRefine is a tool to clean up and organize messy data. Please find instructions to install it and the data used in the lesson in the lesson.

Git

Git is a version control system that lets you track who made changes to what when and has options for easily updating a shared or public version of your code on https://github.com.

Follow the instructions on the lesson to install Git on your system.

You will need an account at github.com for parts of the Git lesson. Basic GitHub accounts are free. We encourage you to create a GitHub account if you don't have one already. Please consider what personal information you'd like to reveal. For example, you may want to review these instructions for keeping your email address private provided at GitHub. You will need a supported web browser.