Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems initializing Models using BokehJS without Python #13732

Open
ianthomas23 opened this issue Mar 2, 2024 · 4 comments · May be fixed by #13872
Open

Problems initializing Models using BokehJS without Python #13732

ianthomas23 opened this issue Mar 2, 2024 · 4 comments · May be fixed by #13872

Comments

@ianthomas23
Copy link
Member

I have been experimenting using BokehJS without Python in TypeScript projects using webpack, React, etc, and as Jupyter extensions. Here is a demo of a WIP frontend-only Jupyter extension using BokehJS:
jupyter_bokehjs.webm

Note that I am talking about explicitly creating BokehJS objects in the browser rather than serving Bokeh documents via say Flask and the BokehJS code decoding the JSON received from the server and using embed. And I am talking about using BokehJS's TypeScript code rather than just JavaScript which can be achieved more simply using the minified bundles from CDN. I am using the BokehJS 3.4.0 dev/RC NPM packages as they have the dependencies specified which is really helpful.

There is a problem with the design of the class hierarchy of BokehJS for this use case which I would like to fix. The HasProps constructor, under certain circumstances, calls finalize() here

this.finalize()

which calls initialize()
initialize(): void {}

which is overridden in various derived classes such as ColumnarDataSource
override initialize(): void {
super.initialize()
this._select = new Signal0(this, "select")
this.inspect = new Signal(this, "inspect")

and here initialises two member variables.

So we have a derived class constructor that calls a base class constructor that calls a derived class function to initialise derived class member variables. Calling an overridden function from a base class constructor is considered bad practice as the member variables we are trying to initialise have not necessarily been allocated yet. Here is some simple TypeScript code to illustrate the scenario:

abstract class Base {
    constructor() {
        this.initialize()
    }

    initialize(): void {}
}

class Derived extends Base {
    variable: number

    constructor() {
        super()
        console.log("End of Derived.constructor, this.variable =", this.variable)
    }

    override initialize(): void {
        super.initialize()
        this.variable = 23
        console.log("End of Derived.initialize, this.variable =", this.variable)
    }
}

In some situations I have tried, this code works as expected. But in others variable is set to 23 but the variable that clients can access remains undefined. This is pretty much a deal-breaker for my various experiments.

In most Bokeh usage this is not a problem as the HasProps initialization is deferred if we are decoding a JSON document, by which time we are outside of the object constructors and all member variables have been allocated. When running the BokehJS test suite locally and in CI is does appear to be a problem either. I do not understand this, but it gives me hope that there might be some TS compiler options that give a reliable solution.

I have created a little repo https://github.com/ianthomas23/bokehjs-expts to demonstrate. The webpack-simple directory uses the Derived and Base code from above. It is possible to avoid the problem here by targetting a different ES version as mentioned in the various READMEs. The webpack-bokehjs directory is a BokehJS example which you can get part-working by forcing the CDS to be initialized again but eventually you hit the problem as DataRange1d has similar problematic initialization of member variables. Here it cannot be worked around using a different ES version.

If we had to redesign the code to avoid calling overridden functions in base class constructors, how would we do it?

Option 1

If a Derived.initialize needs to initialize member variables, call it from Derived.constructor not Base.constructor. To do this we would have to prevent Base.constructor from calling it first. The only communication between Derived and Base here is via constructor arguments, so we'd have to have a defer_initialization boolean passed down the constructor hierarchy. This is probably not acceptable as the argument would become part of the public API when it should be an internal implementation detail.

Option 2

Add a global (?) flag for deferring all initialization which is explicitly set in client code. We would have to temporarily store a sequence of all objects created this way so that when show or whatever is called, objects are popped off it and initialized then. This would be fine for single-shot plots but if we are doing clever stuff with callbacks to create new objects we might have to perform the "check and initialize sequence" in a number of places.

Option 3

Move the initialization of member variables outside of the initialize() functions. This effectively means we no longer override HasProps.initialize(). Those member variables would have to be initialized some other way, such as when they are first needed. DataRange1d has 8 member variables so maybe we'd have to have a single sentinel boolean for all of them and initialise them together.

Option 4

Add an extra level of indirection in constructors so that when we construct an object is has the chance to construct its underlying implementation object and initialise it outside of its own constructor. This would mean lots of boilerplate code because the class the client uses, which is really a proxy for the underlying implementation class, would need to expose the same interface. Probably only really possible using some automatic code generation tools and would be pretty invasive.

It would be great if someone else could take a look at this and see if my analysis make sense or if I have made some fundamental error in understanding the situation.

@mattpap
Copy link
Contributor

mattpap commented Mar 5, 2024

This is a known problem, I think, from the very beginning, though we don't have prior issues open for this. This hasn't been fixed for so long, because of the strong bias towards initialization from JSON, which implies deferred initialization and is handled at the Document/deserialization level. For reference, views suffer from the same problem, actually much worse, but given that they aren't initialized by the user, then we can comfortably implement two stage initialization for them, handled by build_view() or build_views().

HasProps.initialize() has to functions:

  • initialize non-properties based off property values
  • prepare for HasProps.connect_signals()

Both have to be done after properties are initialized, because properties are initialized all at once in HasProps.constructor.

Having HasProps.initialize() has the additional disadvantage of not being able to use TypeScript's readonly modifier on attributes initialized in initialize(). It's also a big contributor to why we can't enable strictPropertyInitialization.

This would be very easy to resolve if JavaScript allowed static callable signatures for classes. It kind of supports this, but only for the old-style "classes", not the new-style that use class keyword. In this approach we would only initialize properties in constructors (when calling with new) and call initialize() and connect_signals() when calling without new (after an object was constructed with new). This could potentially be made backwards compatible. We could approximate this with a static method, e.g. build(), and constructing objects with e.g. Plot.build({...properties}), but it's a very ugly API.

A different and more practical approach I've been considering, is to initialize() and connect_signals() when attaching a Document to a model. This would make both styles of initialization (from JSON and manual) equivalent in their functionality and behavior. initialize() would be called exactly once, whereas connect_signals() and disconnect_signals() would be called on _doc_attach and _doc_detach. Not calling connect_signals() in constructors would also reduce the need for initialize() significantly, e.g. all signals would be defined in constructors and would be readonly.

@ianthomas23
Copy link
Member Author

Thanks for the clarification Mateusz. I like the solution in your last paragraph. I would happily work on this, but I understand from the notes of this week's meeting (which unfortunately I couldn't attend as I was on a delayed flight) that you've already started working on it?

@ianthomas23
Copy link
Member Author

I've looked at the idea of moving the initialize() to HasProps.attach_document() and the connect_signals() to _doc_attached and it fixes the initial problem. But there are multiple layers of problems caused by calling derived-class virtual functions in the base class constructor. For example, Range1d has a reset_start callback that updates the member variable _reset_start defined here:

static {
this.define<Range1d.Props>(({Float, Nullable}) => ({
reset_start: [ Nullable(Float), null, {
on_update(reset_start, self: Range1d) {
self._reset_start = reset_start ?? self.start
},
}],
reset_end: [ Nullable(Float), null, {
on_update(reset_end, self: Range1d) {
self._reset_end = reset_end ?? self.end
},
}],
}))

But this is called during HasProps.initialize_props() from within the HasProps constructor and so self._reset_start isn't guaranteed to be allocated and when you try to reset the viewport self._reset_start is null.

@pastewka
Copy link

#13872

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants