-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Added method predict_interval() to PolynomialTrendForecaster forecaster #6424
base: main
Are you sure you want to change the base?
Conversation
import numpy as np | ||
from scipy.stats import norm | ||
|
||
def l_days_since_1970(idx): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this duplicates _get_X_numpy_int_from_pandas
and might use a different unit of time?
Could we resolve the duplication and check things are consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
May I ask for a few things to deduplicate:
- I think the
l_days_since_1970
duplicates_get_X_numpy_int_from_pandas
, or does sth similar. Could we use the same function in both cases? This is also for consistency in the index. If you think the existing function should be changed, that's in-principle fine, but we need to discuss the why. - Your
predict_interval
overrides the publicpredict_internal
, and hence does not comply with the extension contract. Any implementations should be done in private methods, such as_predict_interval
. - Looking at the algorithm for interval prediction, it seems this is in fact a variance prediction and intervals are obtained from assuming a normal distirbution. The latter is actually the default when a variance prediction is implemented and nothing else, so anything that happens after computing
v
duplicates very similar code in the base class, see_predict_quantiles
there. I would, hence, take the part untilv
and use this to implement_predict_var
. - not blocking, but I wonder: should we give users the option to compute prediction intervals via a parameter? The precomputation of residuals does require additional runtime, so perhaps we do that only if a parameter, say,
prediction_intervals = "on"
is set? And the default is not? Many people will use thie estimator for detrending and do not need prediction intervals, so that use should be as speedy as possible.
Regarding your first point, where is _get_X_numpy_int_from_pandas to be found? |
In VS code also has a nice "search code base" feature - control-shift-F (on windows), or top left, the magnifier symbol. |
I looked at the _get_X_numpy_int_from_pandas and it is similar to l_days_since_1970(). I don't have a very strong opinion on which is better. I would point out that these are basically 1-line or 2-line functions so the amount of duplication is minimal.
|
Yes, the issue is not amount of duplication but consistency. E.g., are we counting days or seconds or similar. By using the same function, we ensure that we do the same thing in all places, and if it gets changed in the future (think bugfix or something about
I think the discrepancy is because the two functions are actually inconsistent in the Since the point prediction in case of Whether one now prefers the one or the other, we need to give the pre-existing convention precedence and ensure the new addition is consistent with it. We can consider, later, to change it, but that needs to go through a deprecation cycle with user warnings, see https://www.sktime.net/en/stable/developer_guide/deprecation.html |
The new method predict_interval() for the PolynomialTrendForecaster computes the prediction interval for a polynomial trend model. The formulas used in the calculation are from Hyndman's FPP3 (Forecasting Principles and Practice, 3rd edition), section 7.9.
For an example of usage see section 4 in the attached pdf.
PullRequestNotes.pdf