Introduction to Web Scraping

UC Santa Barbara Library

Feb 27, 2024

1:00 pm - 4:30 pm PST

Instructors: Renata Curty, Seth Erickson, Jose Niño Muriel

Helpers: Kristi Liu

General Information

The Carpentries project comprises the Software Carpentry, Data Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.

Want to learn more and stay engaged with The Carpentries? Carpentries Clippings is The Carpentries' biweekly newsletter, where we share community news, community job postings, and more. Sign up to receive future editions and read our full archive: https://carpentries.org/newsletter/

Where: Room 2509, UCSB Library, 525 U-Cen Rd, Santa Barbara, CA. Get directions with OpenStreetMap or Google Maps.

When: Feb 27, 2024. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. For workshops at a physical location, the workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email dreamlab@library.ucsb.edu for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

Setup Download files required for the lesson
13:00 1. Introduction: What is web scraping? What is web scraping and why is it useful?
What are typical use cases for web scraping?
13:10 2. Selecting content on a web page with XPath How can I select a specific element on web page?
What is XPath and how can I use it?
13:55 3. Manually scrape data using browser extensions How can I get started scraping data off the web?
How can I use XPath to more accurately select what data to scrape?
15:00 4. Ethics & Legality of Web Scraping When is web scraping OK and when is it not?
Is web scraping legal? Can I get into trouble?
What are some ethical considerations to make?
What can I do with the data that I’ve scraped?
15:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.


Setup

See setup instructions