d3 Scalability – Virtual Scrolling for Large Visualizations

By | May 17, 2014

The visualizations that I work on have large datasets that need to be rendered quickly with as little DOM overhead as possible. One of the cool things about d3 is that is renders the data present at any given moment along with changes to that data that are coming or going. However, if you have a large dataset, d3 can kill the browser under the weight of all the DOM nodes that it is attempting to create. Here is a demo of what I came up with to solve this problem:

Virtual Scrolling Demo

You can explore the GIST code here.

Virtual rendering has been around for a long long time and is used to varying degrees on almost every known development platform to provide access to large datasets without slowing down the UI. For web developers, showing millions of rows of data is easily achieved by using a Javascript data grid that can pull in those rows as the user scrolls through the grid.

I’m definitely not the first one to attempt the implementation of virtual rendering in d3. I came across a d3 Google Group post where using d3 for virtually rendered grids was being discussed and found that Jason Davies had toyed with this a couple of years back. There were many other experiments as well, but I needed one that could render more than just a grid or simple chart.

My solution began with Jason’s longscroll.js class, which I heavily modified to allow the user to specify enter/update/exit methods that the plugin uses for rendering rows. The general idea is that you determine how much data can be shown on the screen at any given time and then slice out that amount of data from the master dataset and render it. As the user drags the scrollbar along, you continue to slice out corresponding ranges of data from the dataset and render them on screen. Ideally, you re-use the components already created on the screen and simply change out the data that backs each node with a new node from the sliced range to avoid the expense of adding and removing DOM elements.

For this plugin, here is what you provide:

  1. a viewport reference – this is the DOM element that will contain the SVG. I write everything with variable resizing in mind so this viewport’s dimensions can change and the plugin will adapt if the resize boolean is set to true when the virtual scrolling action is requested
  2. an svg reference – this is sized based on the rowheight * the total data size, thus creating an accurately sized scroll bar
  3. an svg child group reference – this will be used to render the content and will be transformed within the svg to keep the contents visible.
  4. a total data size – you may wonder why the plugin cannot just look a the size of the data being provided, but if you plan to only page in a subset of the total amount of data, this would not size the scroll bar correctly; therefore the code sets this value independently from the size of the data the plugin knows about
  5. an enter/update/exit method – this will be used to render the rows on the screen

There are several aspects to this solution that should be pointed out:

First, your visualization needs consistently sized rows so you can divide the visible height of the viewport by that size and calculate the range of visible rows. You also use the row height multiplied by the length of the entire dataset to size the scrolling DOM node container so that the scrollbar is sized correctly. I imagine you can implement a version with nodes of different heights, but you will need to know the height of the un-rendered rows above and below the visible range if you want to size the scroll bar correctly.

Second, there is no need to load the entire dataset into memory if you have the ability to get the total size of the dataset from the server and then page in subsets of data as they are needed.

Third, you can also use this solution to render tree structures but things get more complicated if you plan to page in that data on an as-needed basis. More on that in a future post once I get it all figured out. 🙂

5 thoughts on “d3 Scalability – Virtual Scrolling for Large Visualizations

  1. mateolan

    awesome. Looking forward to seeing what you come up with in terms of tree/directed graphs…perhaps a controller like a group of checkboxes could provide the subset filtering mechanism in the same way that the scrollbar/window size does in this example. Thanks for a great, well-detailed explanation of the issues.

    1. Bill Post author

      Thanks. The problem with trees is that, unless you have the server flatten the data for you upon an expansion action in the UI, you have to page in the children yourself. For example, if you have 500 rows at the root of your dataset and each row has 500 children and those children each have 500 children, once the user expands (for example) row 5, the plugin needs to being adding children from row 5’s dataset instead of continuing to page in rows from the root dataset. It gets more complicated if you have a powerful data-paging capability on the server where you can bring in small amounts of data at a time from whatever part of the tree you want. Then again, maybe I’m making it harder than it has to be…

  2. Pingback: Bill White's Blog | “d3 in 3D” Presentation Slides and Examples from Austin d3 Meetup

  3. Chris

    Do you have an example of this scrolling for columns rather than rows? i.e. horizontal rather than vertical scrolling.


Leave a Reply

Your email address will not be published. Required fields are marked *