Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0 #101

Closed
wants to merge 24 commits into from
Closed

3.0 #101

wants to merge 24 commits into from

Conversation

bmschmidt
Copy link
Collaborator

@bmschmidt bmschmidt commented Dec 7, 2023

Major refactor.

3.0.0 includes a number of pent-up breaking changes. The underlying motivation for many of these is to allow library to now fully pass typescript compilation tests, with all the stability benefits that provides.

Breaking changes:

  1. The library is now structured as named exports,
    rather than a single default export. Instead of

    import Scatterplot from 'deepscatter';

    a typical first line will be

    import { Scatterplot } from 'deepscatter';

    This allows the export of several useful types for advanced functions in scatterplots we've found useful at Nomic. The initial set of exported items are {Dataset, Bitmask, Scatterplot}.

  2. The distinction between QuadTile and ArrowTile
    has been eliminated in favor of Tile, and with it the need to provide
    generics around them through the system. Similarly, QuadTileDataset and ArrowDataset are both removed in favor of Dataset.
    Instead, the TileProxy object is used to provide a wrapper than can turn anything into a
    dataset. Although datasets are presumed to be quadtiles right now, formally they can be any
    any collection of arrow batches structured as a tree. (This is increasingly how I've come to think of the data parts of deepscatter: as a system for navigating dataframes that consist of trees rather than of linear lists of points.)

  3. Deepscatter no longer accepts strings as direct
    arguments to Scatterplot.plotAPI in places where they were previously cast to functions
    as lambdas, because linters rightfully get crazy mad about the unsafe use of eval. If
    you want to use deepscatter in scrollytelling
    contexts where definining functions as strings inside json is convenient (I still will do this myself in static sites) you must turn them
    into functions before passing them into deepscatter.

  4. Shortcuts for passing position and position0 rather than naming the x and y dimensions explicitly have been removed.

  5. The behavior of categorical scales in certain circumstances has been tightened; it is possible, as a result, that places where it previously possible to treat categorical scales as numbers (referring to the underlying ints) will no longer work. I am not aware of specific such issues at the moment, and will act responsively to address any issues.

  6. Dataset and Tile objects can now be instantiated with a manifest that allows listing all the tiles in a dataset. When passed, this allows a dataset to instantiate all tiles at creation time without actually loading any data. This represents a major change for any code that access the Tile.record_batch attributes, because they may now error on well-formed tiles since the presence of data is no longer necessary for something to be a Tile. Additionally, the Tile.ready promise has been retired; instead, to check if necessary data exists for a dataset, you must explicitly check if Tile.hasLoadedColumn('foo').

  7. (Another way of expressing this change is that where previously there was a 'primary record batch' and 'sidecar batches', in version 3.0 of deepscatter this distinction is much less important; it is possible, for example, to draw a scatterplot without loading the x and y columns if other columns are passed to encoding.x and encoding.y.)

  8. The tile prioritization rules which previously applied to core tiles in a dataset now apply to all sidecars as well.

  9. It is possible to aggressively load any columns to any depth in the dataset without loading other data using 'Dataset.spawnDownloads()' and Dataset.runDownloads().

  10. The Dataset object is now more independent of the scatterplot, to the point that it can independently run in a non-browser environment like Node. See the unit tests for an example of this.

@bmschmidt bmschmidt marked this pull request as ready for review May 24, 2024 16:10
@bmschmidt bmschmidt closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant