Setup

Setup Instructions

In order to follow this lesson, you will need to make sure the following software is installed on your computer. We will be using the Scraper Chrome extensiona and Python through Anaconda. After installing Anaconda, you will need to install Scrapy through the command line either by pip install or conda forge. Scrapy is an open source framework for extracting data from websites.

Scraper Chrome Extension

For the first half of the lesson, we will use a Chrome browser extension to get started with web scraping. Follow the steps below to install Scraper:

  1. Make sure to have the Chrome browser installed in your computer. Follow the steps on this website and choose your OS if you need to install it.

  2. Open Chrome Web Store

  3. Search for “Scraper” in extensions and it should be the first one listed. Alternatively, you may or click on this link Scraper extension.

  4. Click the “add to” chrome button.

You will see a putty knife icon along your browser like so

scraper

Using the Command line with Python using Anaconda and Scrapy

The second part of the lesson requires the Python programming language and access to a command-line interface (shell) on your computer. If applicable: please log out of any drives on your device before installing the software. (e.g. OneDrive, BoxDrop, etc.)

Prerequisites

This part of the lesson requires some prior knowledge of Python and how to use a shell. If you need help getting started on those topics, we suggest going through the following lessons first (during a workshop or on your own):

Unix Shell: Install software

If you do not already have the shell software installed, you will need to download and install it.

Open a new shell

After installing the software

  1. Open a terminal. If you’re not sure how to open a terminal on your operating system, see the instructions below.
  2. In the terminal type cd then press the Return key. This step will make sure you start with your home folder as your working directory.

In the lesson, you will find out how to access the data files in this folder.

Where to type commands: How to open a new shell

The shell is a program that enables us to send commands to the computer and receive output. It is also referred to as the terminal or command line.

Some computers include a default Unix Shell program. The steps below describe some methods for identifying and opening a Unix Shell program if you already have one installed. There are also options for identifying and downloading a Unix Shell program, a Linux/UNIX emulator, or a program to access a Unix Shell on a server.

If none of the options below address your circumstances, try an online search for: Unix shell [your computer model] [your operating system].

Computers with Windows operating systems do not automatically have a Unix Shell program installed. In this lesson, we encourage you to use an emulator included in Git for Windows, which gives you access to both Bash shell commands and Git.

Once installed, you can open a terminal by running the program Git Bash from the Windows start menu.

For advanced users:

As an alternative to Git for Windows you may wish to Install the Windows Subsystem for Linux which gives access to a Bash shell command-line tool in Windows 10.

Please note that commands in the Windows Subsystem for Linux (WSL) may differ slightly from those shown in the lesson or presented in the workshop.

For a Mac computer running macOS Mojave or earlier releases, the default Unix Shell is Bash. For a Mac computer running macOS Catalina or later releases, the default Unix Shell is Zsh. Your default shell is available via the Terminal program within your Utilities folder.

To open Terminal, try one or both of the following:

  • In Finder, select the Go menu, then select Utilities. Locate Terminal in the Utilities folder and open it.
  • Use the Mac ‘Spotlight’ computer search function. Search for: Terminal and press Return.

To check if your machine is set up to use something other than Bash, type echo $SHELL in your terminal window.

If your machine is set up to use something other than Bash, you can run it by opening a terminal and typing bash.

How to Use Terminal on a Mac

The default Unix Shell for Linux operating systems is usually Bash. On most versions of Linux, it is accessible by running the Gnome Terminal or KDE Konsole or xterm, which can be found via the applications menu or the search bar. If your machine is set up to use something other than Bash, you can run it by opening a terminal and typing bash.

Installing Python using Anaconda

Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of the scientific packages we use in the lesson individually can be a bit cumbersome, and therefore recommend the all-in-one installer Anaconda.

Regardless of how you choose to install it, please make sure you install Python version 3.x.

Installing Anaconda

  1. Open https://www.anaconda.com/products/individual in your web browser.
  2. Download the Anaconda Python 3 installer for Windows.
  3. Double-click the executable and install Python 3 using the recommended settings. Make sure that Register Anaconda as my default Python 3.x option is checked – it should be in the latest version of Anaconda.
  4. Verify the installation: click Start, search and select Anaconda Prompt from the menu. A window should pop up where you can now type commands such as checking your Conda installation with:

    conda --help
    

Video Tutorial

  1. Visit https://www.anaconda.com/products/individual in your web browser.
  2. Download the Anaconda Python 3 installer for macOS. These instructions assume that you use the graphical installer .pkg file.
  3. Follow the Anaconda Python 3 installation instructions. Make sure that the install location is set to “Install only for me” so Anaconda will install its files locally, relative to your home directory. Installing the software for all users tends to create problems in the long run and should be avoided.
  4. Verify the installation: click the Launchpad icon in the Dock, type Terminal in the search field, then click Terminal. A window should pop up where you can now type commands such as checking your conda installation with:

    conda --help
    

Video Tutorial

Note that the following installation steps require you to work from the terminal (shell). If you run into any difficulties, please request help before the workshop begins.

  1. Open https://www.anaconda.com/products/individual in your web browser.
  2. Download the Anaconda Python 3 installer for Linux.
  3. Install Anaconda using all of the defaults for installation.
    • Open a terminal window.
    • Navigate to the folder where you downloaded the installer.
    • Type bash Anaconda3- and press Tab. The name of the file you just downloaded should appear.
    • Press Return
    • Follow the text-only prompts. When the license agreement appears (a colon will be present at the bottom of the screen) press Spacebar until you see the bottom of the text. Type yes and press Return to approve the license. Press Return again to approve the default location for the files. Type yes and press Return to prepend Anaconda to your PATH (this makes the Anaconda distribution your user’s default Python).
  4. Verify the installation: this depends a bit on your Linux distribution, but often you will have an Applications listing in which you can select a Terminal icon you can click. A window should pop up where you can now type commands such as checking your conda installation with:

    conda --help
    


Scrapy

Once you have a working installation of Python, the next step is to install Scrapy.

If you have installed Python using the Anaconda framework as suggested by the Software Carpentry setup instructions, you can easilly install Scrapy by doing the following:

  1. Open a new shell (e.g. Terminal on Mac, or the Anaconda command-line tool on Windows)
  2. Type the following:
 conda install -c conda-forge scrapy

Alternatively, if you have another distribution of Python, you can try using pip:

 pip install Scrapy

If you run into issues while installing Scrapy, refer to the official Scrapy install guide or get in touch with your lesson instructor.