[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424

ericjb · 2024-05-15T04:15:19Z

The new method predict_interval() for the PolynomialTrendForecaster computes the prediction interval for a polynomial trend model. The formulas used in the calculation are from Hyndman's FPP3 (Forecasting Principles and Practice, 3rd edition), section 7.9.

For an example of usage see section 4 in the attached pdf.

PullRequestNotes.pdf

fkiraly · 2024-05-19T11:12:06Z

sktime/forecasting/trend/_polynomial_trend_forecaster.py

+        import numpy as np
+        from scipy.stats import norm
+
+        def l_days_since_1970(idx):


I think this duplicates _get_X_numpy_int_from_pandas and might use a different unit of time?

Could we resolve the duplication and check things are consistent?

fkiraly

Looks good!

May I ask for a few things to deduplicate:

I think the l_days_since_1970 duplicates _get_X_numpy_int_from_pandas, or does sth similar. Could we use the same function in both cases? This is also for consistency in the index. If you think the existing function should be changed, that's in-principle fine, but we need to discuss the why.
Your predict_interval overrides the public predict_internal, and hence does not comply with the extension contract. Any implementations should be done in private methods, such as _predict_interval.
Looking at the algorithm for interval prediction, it seems this is in fact a variance prediction and intervals are obtained from assuming a normal distirbution. The latter is actually the default when a variance prediction is implemented and nothing else, so anything that happens after computing v duplicates very similar code in the base class, see _predict_quantiles there. I would, hence, take the part until v and use this to implement _predict_var.
not blocking, but I wonder: should we give users the option to compute prediction intervals via a parameter? The precomputation of residuals does require additional runtime, so perhaps we do that only if a parameter, say, prediction_intervals = "on" is set? And the default is not? Many people will use thie estimator for detrending and do not need prediction intervals, so that use should be as speedy as possible.

ericjb · 2024-05-21T14:55:32Z

Regarding your first point, where is _get_X_numpy_int_from_pandas to be found?

fkiraly · 2024-05-22T00:30:10Z

In from sktime.forecasting.trend._util, the import is at the top of the file you were working in.

VS code also has a nice "search code base" feature - control-shift-F (on windows), or top left, the magnifier symbol.

ericjb · 2024-05-22T12:20:47Z

I looked at the _get_X_numpy_int_from_pandas and it is similar to l_days_since_1970(). I don't have a very strong opinion on which is better. I would point out that these are basically 1-line or 2-line functions so the amount of duplication is minimal.
The code below shows a small sanity check where converting a pandas index from PeriodIndex to DatetimeIndex has no effect on the return value from my function, but does change the return value on the other function.

import pandas as pd
import numpy as np
from sktime.datasets import load_fpp3
from sktime.forecasting.trend._util import _get_X_numpy_int_from_pandas

def l_days_since_1970(idx):
    if isinstance(idx, pd.DatetimeIndex):
        return idx.astype('int64') // 10**9 // 86400
    elif isinstance(idx, pd.PeriodIndex):
        return idx.to_timestamp().astype('int64') // 10**9 // 86400
    else:
        raise TypeError("Index must be of type DatetimeIndex or PeriodIndex")
    
y = load_fpp3("canadian_gas")
a = l_days_since_1970(y.index)
b = _get_X_numpy_int_from_pandas(y.index)

y.index = y.index.to_timestamp()
a_post = l_days_since_1970(y.index)
b_post = _get_X_numpy_int_from_pandas(y.index)
                                      
a == a_post ## returns True's
b == b_post ## returns False's

fkiraly · 2024-05-22T14:42:28Z

I would point out that these are basically 1-line or 2-line functions so the amount of duplication is minimal.

Yes, the issue is not amount of duplication but consistency. E.g., are we counting days or seconds or similar. By using the same function, we ensure that we do the same thing in all places, and if it gets changed in the future (think bugfix or something about pandas changing their index convention), it is easier to update and not miss any "parallel location".

The code below shows a small sanity check where converting a pandas index from PeriodIndex to DatetimeIndex has no effect on the return value from my function, but does change the return value on the other function.

I think the discrepancy is because the two functions are actually inconsistent in the PeriodIndex case - the existing function, _get_X_etc converts ot number of periods since 1970 start, while yours converts to number of days.

Since the point prediction in case of PeriodIndex does the former, your proba prediction would be inconsistent with it.

Whether one now prefers the one or the other, we need to give the pre-existing convention precedence and ensure the new addition is consistent with it. We can consider, later, to change it, but that needs to go through a deprecation cycle with user warnings, see https://www.sktime.net/en/stable/developer_guide/deprecation.html

Added method predict_interval() to PolynomialTrendForecaster forecaster

e6a449d

ericjb requested review from achieveordie, benHeid, fkiraly and yarnabrina as code owners May 15, 2024 04:15

fkiraly reviewed May 19, 2024

View reviewed changes

fkiraly requested changes May 19, 2024

View reviewed changes

fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424

[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424

ericjb commented May 15, 2024

fkiraly May 19, 2024 •

edited

fkiraly left a comment

ericjb commented May 21, 2024

fkiraly commented May 22, 2024

ericjb commented May 22, 2024 •

edited

fkiraly commented May 22, 2024

[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424

Are you sure you want to change the base?

[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424

Conversation

ericjb commented May 15, 2024

fkiraly May 19, 2024 • edited

Choose a reason for hiding this comment

fkiraly left a comment

Choose a reason for hiding this comment

ericjb commented May 21, 2024

fkiraly commented May 22, 2024

ericjb commented May 22, 2024 • edited

fkiraly commented May 22, 2024

fkiraly May 19, 2024 •

edited

ericjb commented May 22, 2024 •

edited