How I Get Options Data for Free: Web Scraping Tutorial

Options data is extremely valuable for investors and traders looking to analyze the market, backtest strategies, or build financial applications. However, comprehensive historical options data can cost thousands of dollars per year from commercial providers. What if you just want to access options data for personal research or a project?

In this tutorial, I‘ll show you how to obtain current and historical options data for free using web scraping with Python. We‘ll walk through the process step-by-step, from finding free data sources to automating the collection of end-of-day options data. By the end, you‘ll be able to build your own dataset of options prices, greeks, and other metrics.

Why Options Data Is Expensive

Options data is more complex and multidimensional than stock price data. While a stock has a single price at any given time, an underlying asset can have hundreds or thousands of listed options with different strike prices and expiration dates. Options metrics like implied volatility, greeks, and open interest also change constantly as contracts are created, traded, and expired.

Collecting, validating, and distributing all of this data is an intensive process. Leading options data providers like OPRA and LiveVol utilize sophisticated infrastructure to disseminate real-time datafeeds. This data is typically aggregated and normalized by financial data vendors and sold at a premium to institutional investors.

Retail-oriented brokerages and websites offer limited options data, often with compromises like delayed quotes, limited history, or only end-of-day summaries. Premium APIs from retail brokers can provide more real-time data, but with usage restrictions and costs that add up quickly for extensive backtesting or application development.

Free and Low-Cost Options Data Sources

While real-time options data remains expensive, there are a number of free and affordable sources for end-of-day options quotes and greeks:

Yahoo Finance: Yahoo provides free delayed quotes for options, including bid/ask spreads, last trades, volume, and implied volatility. While only the current expiration is available for free, Yahoo is one of the most accessible sources of options data.

CBOE: The Chicago Board Options Exchange offers end-of-day quotation summaries as downloadable text files. Files since 2008 are available, containing options traded across all US exchanges. However, the raw files require parsing and do not include calculated greeks.

IEX Cloud: Investors Exchange LLC provides an easy-to-use API with end-of-day options quotes as part of their free tier. IEX data includes bid/ask spreads, last trade prices, volume, and open interest for options traded on IEX.

Discount Brokers: Many retail brokers offer APIs with options data bundled with other market data. For example, Tradier provides delayed quotes, greeks, and market depth snapshots starting at $10 per month. TD Ameritrade‘s API offers similar data for free to account holders.

While these sources provide a good starting point, they have limitations like delayed quotes, limited metrics, or data going back only a few years. By using web scraping, we can build more extensive datasets from a combination of free sources.

Web Scraping Options Data with Python

Web scraping involves programmatically downloading and parsing data from websites. Python is a popular language for web scraping due to its simplicity and powerful libraries for making HTTP requests and extracting data from HTML pages.

Let‘s walk through how to scrape end-of-day options data from Yahoo Finance using Python, BeautifulSoup, and Requests. This example will fetch the latest price quotes and greeks for a selected options contract.

Initial Setup

First, make sure you have Python 3.x installed. In your terminal, create a new directory for the project and navigate into it:

mkdir options-scraper 
cd options-scraper

Create a new Python file called scraper.py and open it in your preferred code editor. At the top of the file, import the required libraries:

import requests
from bs4 import BeautifulSoup
import datetime

Fetching the Options Page

To download the HTML of the Yahoo Finance options page, we‘ll use the Requests library to send an HTTP GET request to the URL. Add this function to your Python script:

def fetch_options_page(symbol):
    url = f"https://finance.yahoo.com/quote/{symbol}/options"
    response = requests.get(url)

    if response.status_code != 200:
        raise ValueError(f"Failed to load page, status code: {response.status_code}")

    return BeautifulSoup(response.content, "html.parser")

This function takes a stock symbol, fetches the corresponding options page, and returns the parsed HTML content using BeautifulSoup. Any failures due to connection issues or a missing page will raise an error.

Extracting Options Data

With the page HTML loaded, we can use BeautifulSoup to locate and extract the data we want. Yahoo Finance options pages have two HTML tables, one for calls and one for puts. Each row represents an options contract with pricing data and greeks in different cells.

Let‘s parse the call options table into a Python list of dictionaries:

def parse_options_table(table):
    rows = table.find_all("tr")

    headers = [cell.get_text() for cell in rows[0].find_all("th")]

    data = []

    for row in rows[1:]:
        cells = row.find_all("td")

        option = {
            "contract": cells[0].get_text(strip=True),
            "last_trade": cells[1].get_text(strip=True),
            "strike": cells[2].get_text(strip=True),
            "last": cells[3].get_text(strip=True),
            "bid": cells[4].get_text(strip=True),
            "ask": cells[5].get_text(strip=True),
            "change": cells[6].get_text(strip=True),
            "percent_change": cells[7].get_text(strip=True),
            "volume": cells[8].get_text(strip=True),
            "open_interest": cells[9].get_text(strip=True),
            "implied_volatility": cells[10].get_text(strip=True)
        }

        data.append(option)

    return data

Here we find each <tr> row element in the table, extract the text content from the cells, and build a dictionary mapping the column names to values. The dictionaries are collected into a list to represent the full dataset.

To tie it all together, let‘s add a main function to load the options page and print out the extracted call options data:

if __name__ == "__main__":
    symbol = "SPY"

    options_page = fetch_options_page(symbol)

    calls_table = options_page.find("table", {"class": "calls"})
    calls_data = parse_options_table(calls_table)

    print(f"{len(calls_data)} call options found for {symbol}:")
    print(calls_data)

When you run the script with python scraper.py, you should see the latest call options chain for SPY printed out. Congrats, you just scraped options data from the web!

Scraping put options data is done similarly using the puts table:

puts_table = options_page.find("table", {"class": "puts"})
puts_data = parse_options_table(puts_table)

The Ethics and Legality of Web Scraping

Web scraping occupies a gray area that is worth considering before you start any scraping project. In general, scraping publicly available data for personal, non-commercial use is considered acceptable. However, many websites prohibit scraping in their terms of service and may try to block excessive or aggressive scraping.

Some tips for staying on the right side of web scraping:

  • Respect robots.txt files that indicate what pages should not be scraped
  • Don‘t hammer servers with an excessive rate of requests
  • Set a descriptive User-Agent header identifying your scraper
  • Only scrape and save data you really need
  • Don‘t share, sell, or monetize scraped data without permission
  • Use official APIs when available instead of scraping

It‘s also crucial to check the terms of service for any website you want to scrape. Yahoo Finance permits limited personal use of data but requires a commercial license for extensive or business use.

Disclaimer: I am a developer, not a lawyer. This is general advice but not a substitute for professional legal counsel.

Scheduling Recurring Scraping Jobs

To get daily options data, we‘ll want to run our web scraper automatically at a regular interval, such as after market close each trading day. A simple way to do this is scheduling the script to run as a cron job on a Linux server.

Here‘s how we‘d modify the script to run as a daily cron job:

  1. Refactor the core logic into a scrape_options_data function that fetches data for one or more symbols and saves it to CSV files.

  2. Add logic to check if the market is currently open and skip scraping if so. This avoids wasting requests when new data is not expected.

import pytz
from datetime import datetime

def is_market_open(timezone="US/Eastern"):
    tz = pytz.timezone(timezone)
    now = datetime.now(tz)

    # TODO: check if it‘s a holiday

    return 9 <= now.hour < 16 and now.weekday() < 5
  1. Invoke the scraper from a main function that either starts scraping or schedules the job to run after the next market close.
if __name__ == "__main__":
    if is_market_open():
        # Compute the time of the next market close (4:00 PM EST)
        now = datetime.now(pytz.timezone("US/Eastern"))
        next_close = now.replace(hour=16, minute=0, second=0, microsecond=0)

        delay = math.ceil((next_close - now).total_seconds())
        print(f"Market is open. Scheduling job to run in {delay} seconds.")

        time.sleep(delay)

    print("Starting scraping job...")
    scrape_options_data(["SPY", "AAPL", "TSLA"])
    print("Job completed.")
  1. Set up a cron job on your server to execute the Python script every weekday at 4:05 PM EST (1 minute after market close). Edit your user‘s crontab with crontab -e and add an entry like:
5 16 * * 1-5 /path/to/python /path/to/options_scraper.py >> /path/to/scraper.log 2>&1

This will run the scraper every Monday through Friday at 4:05 PM EST and append the output to a log file. Adjust the timing or frequency to your needs and be sure to monitor the logs for any issues.

Storing and Using Options Data

As mentioned earlier, our script writes the scraped data to CSV files for easy storage and access. This allows importing the historical data into Excel, Pandas dataframes, or another analysis tool.

Some ideas for using your scraped options data:

  • Use the options greeks and volatility metrics to analyze market sentiment over time
  • Backtest options trading strategies using the historical prices
  • Train machine learning models to predict future options prices
  • Build visualizations of options pricing trends

If you intend to build an application using options data, you‘ll likely want to import the CSVs into a database like PostgreSQL for efficient querying. You could further automate loading the CSVs from the scraper into the database as part of a daily ETL pipeline.

Next Steps

We‘ve covered the basics of obtaining free options data through web scraping using Python and scheduling the script to fetch data daily. To further improve your options scraper, you might:

  • Scrape data for more symbols or from additional sources
  • Parse and calculate more options metrics like greeks or spreads
  • Implement robust error handling and alerting for scraping issues
  • Develop a web UI for browsing and visualizing the collected data
  • Explore official options data APIs for more flexible querying

I recommend the following resources for going deeper with options data and analysis with Python:

As you can see, web scraping opens up many possibilities for working with options data on your own terms without paying for expensive data subscriptions. While real-time data remains out of reach, a few lines of Python can easily fetch end-of-day options quotes for research and personal projects. Just be sure to scrape responsibly and always check the terms of any website or API.

Happy scraping!

Similar Posts