Spreadsheets getting you down? Converting row data to JSON trees is a breeze with treeize

If you‘ve worked on a modern web application, you‘ve almost certainly dealt with JSON. According to this analysis by W3Techs, JSON is used by over 63% of all websites. Its lightweight, flexible structure makes it the data format of choice for everything from API responses and configuration files to NoSQL databases like MongoDB.

But despite JSON‘s ubiquity, developers still frequently find themselves needing to reshape data from other sources into a JSON-friendly hierarchical format. A common culprit is spreadsheet data. While Excel files are great for non-technical users, their tabular shape is often at odds with the nested objects and arrays that JSON and JavaScript work so well with.

Imagine you receive book data in a spreadsheet with columns for author, title, and category. To use it in your web app, you want to convert it to a hierarchical JSON structure grouped by author and category, like:

{
  "Asimov, Isaac": {
    "Science Fiction": [
      "Foundation",
      "I, Robot"
    ],
    "Science": [
      "The Gods Themselves" 
    ]
  },
  "Doyle, Arthur": {
    "Mystery": [
      "A Study in Scarlet",
      "The Sign of Four"
    ], 
    "Detective Fiction": [
      "The Hound of the Baskervilles"
    ]
  }  
}

Doing this conversion by hand can be a real pain. You‘d need to iterate over each row, checking if the author and category already exist in the output object and creating them if not, then pushing the title to the appropriate array. With a large spreadsheet, this bespoke reshaping code can quickly get messy and difficult to maintain.

The treeize Way

Luckily, the treeize library makes this restructuring process much simpler. Rather than manually building up the hierarchy with complex logic, you can declaratively specify the JSON output you want using a special syntax right in your column names.

Here‘s how you might convert a books spreadsheet to a hierarchical JSON object with treeize:

const Treeize = require(‘treeize‘);

// Assuming booksData is an array of row objects 
const tree = new Treeize();

tree.grow(booksData.map(row => ({  
  authorName: `${row.lastName}, ${row.firstName}`,
  ‘categories(category)‘: row.category,  
  ‘categories(titles)‘: row.title
})));

const jsonOutput = tree.getData();

Let‘s break this down. The tree.grow() method takes an array of objects to reshape. The keys of these objects use parentheses to indicate grouping columns (authorName is used for the top-level grouping). For properties that should be arrays like titles, we put the plural key in parentheses.

Under the hood, treeize is using this declarative key structure to convert the flat row data into a hierarchical object. According to the source code, it does this by splitting the keys on the parentheses to determine the nested path for each row‘s values. This results in a simpler, more linear algorithm compared to the branching logic needed for a manual reshaping.

While treeize handles the heavy lifting, it‘s still important to make sure your input data is clean and consistent. Extra whitespace, misspelled column names, or inconsistent values can all lead to malformed output.

I recommend doing any necessary data filtering and validation before passing the rows to treeize. Here are a few libraries that can help with that:

Library Use Case
validator.js String validation and sanitization
xlsx Parsing Excel files
data-forge Data transformation and analysis

A Practical Example: Visualizing Book Data with D3.js

To illustrate how treeize can simplify real-world data transformation tasks, let‘s walk through converting a spreadsheet of book data into a format suitable for visualizing with the D3.js library. D3 is a powerful tool for building interactive charts and graphs in the browser, but it expects hierarchical data.

Suppose we have an Excel file with columns for book title, author name, publication year, page count, category, and rating. Our goal is to create a treemap showing the relative popularity of each book category, with individual books sized by page count and colored by rating.

First we‘ll load the Excel file and convert it to an array of row objects using the xlsx library:

const XLSX = require(‘xlsx‘);

const workbook = XLSX.readFile(‘books.xlsx‘);  
const sheet = workbook.Sheets[workbook.SheetNames[0]];
const rows = XLSX.utils.sheet_to_json(sheet);

Next, we‘ll define a mapping function to reshape each row into the structure we want for our treemap. We‘ll use treeize‘s declarative key syntax to group the data by category while keeping the relevant fields for each book:

function shapeRow(row) {
  const { title, authorName, year, pages, rating } = row;
  const size = parseInt(pages); 

  return {
    name: title,
    size,
    year: parseInt(year),
    rating: parseFloat(rating), 
    ‘children(name)‘: row.category,
    ‘children(children)‘: {
      name: authorName, 
      size, 
      year: parseInt(year),
      rating: parseFloat(rating),      
    }
  }
}

A few things to note here:

  • We parse the numeric columns (pages, rating, year) to ensure they are valid numbers in the output
  • The book title and author are set as the name property since this is the default label key for D3‘s treemap
  • We use children as the grouping key for categories and books, as this is what D3‘s hierarchy utilities expect

Finally, we‘ll pass our transformed rows to treeize and convert the result to a D3 hierarchy object:

const treeize = new Treeize();
const categoryTree = treeize.grow(rows.map(shapeRow)).getData();

const root = d3.hierarchy(categoryTree)  
  .sum(d => d.size)
  .sort((a, b) => b.value - a.value);

The sum and sort calls tell D3 how to size and order the treemap nodes.

We can now render our interactive visualization using a few more lines of D3 (omitted for brevity, but see the D3 treemap docs for details). The resulting graphic might look something like this:

D3.js treemap showing book categories sized by total page count

Hovering over each category rectangle would reveal more info about its cumulative metrics, while zooming in would show the individual books grouped within the category, sized and colored based on the properties we specified.

By automating the spreadsheet-to-JSON conversion process with treeize, we‘re able to spend less time worrying about data manipulaation and more time crafting compelling visualizations and user experiences.

Conclusion

Reshaping tabular data into hierarchical JSON is a common challenge for web developers working with non-programming data sources. The treeize library provides an elegant, declarative solution to this problem that can save significant time and effort compared to manual restructuring.

By specifying the desired output structure using a simple key syntax right in our column names, we can leverage treeize to transform large spreadsheets into ideally-shaped JSON trees with minimal custom code. This frees us up to focus on higher-level application logic and data presentation.

While the Excel-to-D3 example illustrates one potential use case, treeize‘s declarative approach is flexible enough to handle all sorts of data reshaping needs. Whether you‘re building a RESTful API, importing config files, or analyzing data, treeize is a handy tool to have in your developer toolbox.

So the next time you find yourself staring down an unwieldy spreadsheet, give treeize a shot. Your future self (and your fellow devs) will thank you.

Further Reading:

Similar Posts