Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ericjb
Copy link
Contributor

@ericjb ericjb commented May 15, 2024

The new method predict_interval() for the PolynomialTrendForecaster computes the prediction interval for a polynomial trend model. The formulas used in the calculation are from Hyndman's FPP3 (Forecasting Principles and Practice, 3rd edition), section 7.9.

For an example of usage see section 4 in the attached pdf.

PullRequestNotes.pdf

import numpy as np
from scipy.stats import norm

def l_days_since_1970(idx):
Copy link
Collaborator

@fkiraly fkiraly May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this duplicates _get_X_numpy_int_from_pandas and might use a different unit of time?

Could we resolve the duplication and check things are consistent?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

May I ask for a few things to deduplicate:

  • I think the l_days_since_1970 duplicates _get_X_numpy_int_from_pandas, or does sth similar. Could we use the same function in both cases? This is also for consistency in the index. If you think the existing function should be changed, that's in-principle fine, but we need to discuss the why.
  • Your predict_interval overrides the public predict_internal, and hence does not comply with the extension contract. Any implementations should be done in private methods, such as _predict_interval.
  • Looking at the algorithm for interval prediction, it seems this is in fact a variance prediction and intervals are obtained from assuming a normal distirbution. The latter is actually the default when a variance prediction is implemented and nothing else, so anything that happens after computing v duplicates very similar code in the base class, see _predict_quantiles there. I would, hence, take the part until v and use this to implement _predict_var.
  • not blocking, but I wonder: should we give users the option to compute prediction intervals via a parameter? The precomputation of residuals does require additional runtime, so perhaps we do that only if a parameter, say, prediction_intervals = "on" is set? And the default is not? Many people will use thie estimator for detrending and do not need prediction intervals, so that use should be as speedy as possible.

@ericjb
Copy link
Contributor Author

ericjb commented May 21, 2024

Regarding your first point, where is _get_X_numpy_int_from_pandas to be found?

@fkiraly
Copy link
Collaborator

fkiraly commented May 22, 2024

In from sktime.forecasting.trend._util, the import is at the top of the file you were working in.

VS code also has a nice "search code base" feature - control-shift-F (on windows), or top left, the magnifier symbol.

@ericjb
Copy link
Contributor Author

ericjb commented May 22, 2024

I looked at the _get_X_numpy_int_from_pandas and it is similar to l_days_since_1970(). I don't have a very strong opinion on which is better. I would point out that these are basically 1-line or 2-line functions so the amount of duplication is minimal.
The code below shows a small sanity check where converting a pandas index from PeriodIndex to DatetimeIndex has no effect on the return value from my function, but does change the return value on the other function.

import pandas as pd
import numpy as np
from sktime.datasets import load_fpp3
from sktime.forecasting.trend._util import _get_X_numpy_int_from_pandas

def l_days_since_1970(idx):
    if isinstance(idx, pd.DatetimeIndex):
        return idx.astype('int64') // 10**9 // 86400
    elif isinstance(idx, pd.PeriodIndex):
        return idx.to_timestamp().astype('int64') // 10**9 // 86400
    else:
        raise TypeError("Index must be of type DatetimeIndex or PeriodIndex")
    
y = load_fpp3("canadian_gas")
a = l_days_since_1970(y.index)
b = _get_X_numpy_int_from_pandas(y.index)

y.index = y.index.to_timestamp()
a_post = l_days_since_1970(y.index)
b_post = _get_X_numpy_int_from_pandas(y.index)
                                      
a == a_post ## returns True's
b == b_post ## returns False's

@fkiraly
Copy link
Collaborator

fkiraly commented May 22, 2024

I would point out that these are basically 1-line or 2-line functions so the amount of duplication is minimal.

Yes, the issue is not amount of duplication but consistency. E.g., are we counting days or seconds or similar. By using the same function, we ensure that we do the same thing in all places, and if it gets changed in the future (think bugfix or something about pandas changing their index convention), it is easier to update and not miss any "parallel location".

The code below shows a small sanity check where converting a pandas index from PeriodIndex to DatetimeIndex has no effect on the return value from my function, but does change the return value on the other function.

I think the discrepancy is because the two functions are actually inconsistent in the PeriodIndex case - the existing function, _get_X_etc converts ot number of periods since 1970 start, while yours converts to number of days.

Since the point prediction in case of PeriodIndex does the former, your proba prediction would be inconsistent with it.

Whether one now prefers the one or the other, we need to give the pre-existing convention precedence and ensure the new addition is consistent with it. We can consider, later, to change it, but that needs to go through a deprecation cycle with user warnings, see https://www.sktime.net/en/stable/developer_guide/deprecation.html

@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants