Time Series Machine Learning Regression FrameworkBuilding a time series forecasting pipeline to predict weekly sales transactionPouryaBlockedUnblockFollowFollowingApr 6IntroductionFig.

1) Let’s borrow the concept of time from Entropy & thermodynamics: Entropy is the only quantity in the physical sciences that seems to imply a particular direction of progress, sometimes called an arrow of time.

As time progresses, the second law of thermodynamics states that the entropy of an isolated system never decreases in large systems over significant periods of time.

Can we consider our Universe as an isolated large system? Then what?Time! One of the most challenging concepts in the Universe.

I have studied Physics for many years and seen how most brilliant scientists have been struggling to deal with the concept of the time.

In machine learning, even we are far from those complicated Physics theories, in which the common understanding of the concept of the time modifies, the existence of time and the sequence of observations in an ML problem can make the problem much more complicated.

Machine Learning for Time Series:Fig.

2) In time series forecasting, we use historical data to forecast the future.

George Santayana: Those Who Do Not Learn History Are Doomed To Repeat It.

The right figure is taken from https://www.

kdnuggets.

com/2018/02/cartoon-valentine-machine-learning.

htmlA time series is a sequence of observations taken sequentially in time.

Time series forecasting involves taking models then fit them on historical data then using them to predict future observations.

Therefore, for example, min(s), day(s), month(s), ago of the measurement is used as an input to predict theFig.

3) Transform Time Series to Supervised Machine Learning.

next min(s), day(s), month(s).

The steps that are considered to shift the data backward in the time(sequence), called lag times or lags.

Therefore, a time series problem can be transformed into a supervised ML by adding lags of measurements as inputs of the supervised ML.

see Fig.

3 right.

Generally, explore the number of lags as a hyperparameter.

Fig.

4) Transform the time series to supervised machine learning by adding lags.

Lags are basically the shift of the data one step or more backward in the time.

Cross-Validation for Time SeriesCross-validation for time series is different from machine-learning problems that time or sequence is not involved.

In the case of the absence of time, we select a random subset of data as a validation set to estimate the accuracy of the measurement.

In time series, we often predict a value in the future.

Therefore, the validation data always has to occur after the training data.

There are two schemas sliding-window and Forward Chaining validation methods, that can be used for the time series CV.

Fig.

5) Basically, there are two kinds of cross-validation for the time series sliding window and forward chaining.

In this post, we will consider forward chaining cross-validation methodFig.

5 top shows the Sliding Window method.

For this method, we train on n-data points and validate the prediction on the next n-data points, sliding the 2n training/validation window in time for the next step.

Fig.

5 bottom shows Forward Chaining method.

For this method, we train on the last n-data points and validate the prediction on the next m-data points, sliding the n+m training/validation window in time.

In this way, we can estimate the parameters of our model.

To test the validity of the model, we might use a block of data at the end of our time series which is reserved for testing the model with the learned parameters.

Fig.

6) Forward chaining cross-validation.

Fig 6.

shows how the Forward Chaining CV works.

Here, there is one lag.

Thus, we train the model from the first to third second/min/hour/day, etc.

then validate on forth and so on.

Since, now, we get familiar with TS problem, let’s chose a time series problem and build a forecasting model.

Forecasting Weekly Sales TransactionsImagine a manager of a shop asked us to build an ML model to forecast the number of sales for the next week.

The model must be run every Sunday and the result of the prediction must be reported on every Monday morning.

Then, the manager can make a decision about the number of orders for the week.

The manager provides us the data of sales 811 products for 52 weeks.

The sales data can be found in the UCI Repository or kaggle.

Let’s look at the data.

Many data scientists might create a single model for each product to forecast the number of sales.

And while this can work well, we may have problems due to having only 52 data points for each model, which is really low! Although this approach is possible, it might not the best solution.

Besides, if there are interactions among the number of sales of two or more products, we might miss their interactions by building a single model for each product.

Therefore, at this post, we will investigate how we can build a multiple time series forecasting models.

Data preparingThe raw data has a column for the product code and 52 weeks columns for sales.

First, we are going to create a new data frame by melting the data on the weeks.

Thus, the new data frame has three columns, product code, week and sales.

Besides, the “W” & “P” are dropped from the week and the product respectively.

So, let’s take look at the head and the tail of the new data frameIn order to get familiar with the dataset, the distribution of sales is plotted at Fig.

7.

It is seen that there are big numbers of products with a very small amount of sales, and the data skew toward the left as well.

The effect of the issue on modeling will be discussed later.

Fig.

7) Sales distribution.

There are many product sales items with very low sales.

Basic Feature EngineeringSince the goal of this post is not Feature Engineering for TS, we will try to keep this part as simple as possible.

Let’s create two features which usually are used for time series.

One step back in time, 1-lag(shift =1) and the difference between the number of purchase a week ago(W 1) and its previous week, means, two weeks ago (W2).

After that, since the lag and diff result in having null in the dataset, see Fig.

4, we drop them.

Therefore, when we look at the head of the data frame, it starts from week = 2.

“ToSupervised” and “ToSupervisedDiff” classes, code 1 and code 2, shown at the coding section, are used to obtain the new data frame through a simple pipeline:steps = [('1_step', ToSupervised('Sales','Product_Code',1)), ('1_step_diff', ToSupervisedDiff('1_Week_Ago_Sales', 'Product_Code',1,dropna=True))]super_1 = Pipeline(steps).

fit_transform(df)Now, the data has a proper shape for using it at supervised ML.

Forward-Chaining Cross-Validation:The other problem when we are working on time series, we have to deal with its CV for time series.

We chose forward-chaining for the model validation.

To avoid having a very good model in a small number of weeks, we will use every week from 40 to 52, repeating the process for one at a time, and compute the score.

Therefore, the k-fold code at this schema can be found at C.

3.

kf = Kfold_time(target='Sales',date_col = 'Week', date_init=40, date_final=52)Since this post is just a demonstration, I am not separating a test dataset.

In a real project, always keep some periods out, as a test dataset, to evaluate the model on unseen data.

MetricSince the problem is Regression, there are several well-known metrics to evaluate the model, such as Mean Square Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Root Mean Squared Log Error (RMSLE), R-squared, so forth.

Each of these metrics has its own use case and they punish the error differently while there are some similarities among them as well.

In this post, RMSLE is chosen to evaluate the model.

Baseline:Usually, when we build a model, we might come up with a very simple assumption that we expect using ML can improve it.

Here, let’s assume that the number of each product is sold at the current week, it will be the same for the next week.

It means that if product-1 is sold 10 times at week-1, its sales number also will be the same for week-2.

It generally is not a bad assumption.

So, let’s consider this assumption as our baseline model.

The baseline model is coded at C.

5, let’s see how the baseline model worksbase_model = BaseEstimator('1_Week_Ago_Sales')errors = []for indx,fold in enumerate(kf.

split(super_1)): X_train, X_test, y_train, y_test = fold error = base_model.

score(X_test,y_test,rmsle) errors.

append(error) print("Fold: {}, Error: {:.

3f}".

format(indx,error)) print('Total Error {:.

3f}'.

format(np.

mean(errors)))Fold: 0, Error: 0.

520Fold: 1, Error: 0.

517Fold: 2, Error: 0.

510Fold: 3, Error: 0.

508Fold: 4, Error: 0.

534Fold: 5, Error: 0.

523Fold: 6, Error: 0.

500Fold: 7, Error: 0.

491Fold: 8, Error: 0.

506Fold: 9, Error: 0.

505Fold: 10, Error: 0.

522Fold: 11, Error: 0.

552Total Error 0.

516Here, fold 0 to 11 indicate week = 40 to week = 52.

The mean of RMSLE for the baseline model over these 12 weeks is 0.

51.

This can be considered as a big error which might originate from the huge number of items were sold very little amounts as shown in Fig.

7.

Machine Learning Models:Now, we will apply the ML to improve the baseline prediction.

Let’s define a Time Series Regressor Class, C.

5, which work with our time series cross-validation.

This class gets the cv and model and it returns, the model prediction and its score.

There is a wide range of ML algorithms that can be used as an estimator.

Here, we select a Random Forest.

Simply, RF can be considered as a combination of bagging and randomly selecting columns of features on top of decision trees.

Therefore, it reduces the variance of the prediction of a decision tree model.

So, it usually has better performance than a single tree and has weaker performance than ensemble methods which are designed to reduce the bias error of the decision tree model.

model = RandomForestRegressor(n_estimators=1000, n_jobs=-1, random_state=0)steps_1 = [('1_step', ToSupervised('Sales','Product_Code',1)), ('1_step_diff', ToSupervisedDiff('1_Week_Ago_Sales', 'Product_Code',1,dropna=True)), ('predic_1', TimeSeriesRegressor(model=model,cv=kf))]super_1_p = Pipeline(steps_1).

fit(df)Model_1_Error = super_1_p.

score(df)we getFold: 0, Error: 0.

4624Fold: 1, Error: 0.

4596Fold: 2, Error: 0.

4617Fold: 3, Error: 0.

4666Fold: 4, Error: 0.

4712Fold: 5, Error: 0.

4310Fold: 6, Error: 0.

4718Fold: 7, Error: 0.

4494Fold: 8, Error: 0.

4608Fold: 9, Error: 0.

4470Fold: 10, Error: 0.

4746Fold: 11, Error: 0.

4865Total Error 0.

4619It seems that the model works and the error decrease.

Let’s add more lags and evaluate the model again.

Since we built the pipeline, add more lags would be very simple.

steps_3 = [('1_step', ToSupervised('Sales','Product_Code',3)), ('1_step_diff', ToSupervisedDiff('1_Week_Ago_Sales','Product_Code',1)), ('2_step_diff', ToSupervisedDiff('2_Week_Ago_Sales','Product_Code',1)), ('3_step_diff', ToSupervisedDiff('3_Week_Ago_Sales', 'Product_Code',1,dropna=True)), ('predic_3', TimeSeriesRegressor(model=model,cv=kf,scoring=rmsle))]super_3_p = Pipeline(steps_3).

fit(df)Fold: 0, Error: 0.

4312Fold: 1, Error: 0.

4385Fold: 2, Error: 0.

4274Fold: 3, Error: 0.

4194Fold: 4, Error: 0.

4479Fold: 5, Error: 0.

4070Fold: 6, Error: 0.

4395Fold: 7, Error: 0.

4333Fold: 8, Error: 0.

4387Fold: 9, Error: 0.

4305Fold: 10, Error: 0.

4591Fold: 11, Error: 0.

4534Total Error 0.

4355It seems that the error of the prediction reduces again and the model learns more.

We can continue to add lags and see how the performance of the model changes; however we will postpone this process until we use LGBM as an estimator.

Statistical Transformations:The distribution of sales, Fig.

7 shows that the data skews toward the low sales number or to the left.

Usually, Log transforms are useful when applied to skewed distributions as they tend to expand the values which fall in the range of lower magnitudes and tend to compress or reduce the values which fall in the range of higher magnitudes.

The interpretability of the models changes when we do statistical transformations since the coefficients no more tell us about the original features but the transformed features.

Therefore, while we apply np.

log1p on the sales number to transform its distribution to become closer to the normal distribution, we also apply np.

expm1 on the result of the prediction, see C.

6, TimeSeriesRegressorLog.

Now, we repeat the calculation with the mentioned transformationsteps_3_log = [('1_step', ToSupervised('Sales','Product_Code',3)), ('1_step_diff', ToSupervisedDiff('1_Week_Ago_Sales', 'Product_Code',1)), ('2_step_diff', ToSupervisedDiff('2_Week_Ago_Sales', 'Product_Code',1)), ('3_step_diff', ToSupervisedDiff('3_Week_Ago_Sales', 'Product_Code',1,dropna=True)), ('predic_3', TimeSeriesRegressorLog(model=model, cv=kf,scoring=rmsle))]super_3_p_log = Pipeline(steps_3_log).

fit(df)so we haveFold: 0, Error: 0.

4168Fold: 1, Error: 0.

4221Fold: 2, Error: 0.

4125Fold: 3, Error: 0.

4035Fold: 4, Error: 0.

4332Fold: 5, Error: 0.

3977Fold: 6, Error: 0.

4263Fold: 7, Error: 0.

4122Fold: 8, Error: 0.

4301Fold: 9, Error: 0.

4375Fold: 10, Error: 0.

4462Fold: 11, Error: 0.

4727Total Error 0.

4259This shows that the performance of the model improves and the error reduces again.

Ensemble ML:Now, it is time to use a stronger ML estimator to improve forecasting.

We chose LightGBM as a new estimator.

So let’s repeat the calculationmodel_lgb = LGBMRegressor(n_estimators=1000, learning_rate=0.

01)steps_3_log_lgbm = [('1_step', ToSupervised('Sales','Product_Code',3)), ('1_step_diff', ToSupervisedDiff('1_Week_Ago_Sales', 'Product_Code',1)), ('2_step_diff', ToSupervisedDiff('2_Week_Ago_Sales', 'Product_Code',1)), ('3_step_diff', ToSupervisedDiff('3_Week_Ago_Sales', 'Product_Code',1, dropna=True)), ('predic_3', TimeSeriesRegressorLog(model=model_lgb, cv=kf,scoring=rmsle))]super_3_p_log_lgbm = Pipeline(steps_3_log_lgbm).

fit(df)then,Fold: 0, Error: 0.

4081Fold: 1, Error: 0.

3980Fold: 2, Error: 0.

3953Fold: 3, Error: 0.

3949Fold: 4, Error: 0.

4202Fold: 5, Error: 0.

3768Fold: 6, Error: 0.

4039Fold: 7, Error: 0.

3868Fold: 8, Error: 0.

3984Fold: 9, Error: 0.

4075Fold: 10, Error: 0.

4209Fold: 11, Error: 0.

4520Total Error 0.

4052Again, we successfully improve the prediction.

Tuning Number of Steps:At this section, we are going to tune the number of steps (lags/diffs).

I intentionally postpone tuning the steps this section after using LGBM as the regressor because it is faster than RF.

Fig.

8 clearly shows that by adding more steps to the model the error reduces; however, as we expect, it is seen that after passing a threshold around steps = 14, adding further steps will not reduce the error significantly.

You might be interested in defining an error threshold for stopping this process.

Steps= 20 is chosen for the rest of the calculation.

Please check code C 7.

A and B for tuning.

model_lgbm = LGBMRegressor(n_estimators=1000, learning_rate=0.

01)list_scores2 = stepsTune(df,TimeSeriesRegressorLog(model=model_lgbm, scoring=rmsle,cv=kf,verbosity=False),20)Fig 8) Tuning the number of lags/diff is shown.

The x-axis shows the RMSLE error as a function of the steps (number of lags/diffs).

The model improves by adding more steps; however, steps more than 14 do not improve the model significantly.

Tune Hyperparameters:At this part, we are going to implement the grid search method in the way that we can apply it through the pipeline, see Code 8 A and B.

C.

8.

A code is borrowed from Sklearn library.

The aim of this part is not building a fully tuned model.

We trying to show how the workflow is.

After slightly tuning, the error becomesRMSLE= 0.

3868for these two hyperparameters {‘learning_rate’: 0.

005, ‘n_estimators’: 1500}.

params = {'n_estimators':[100,500,1000,1500,2000], 'learning_rate':[0.

005,.

01,.

1]}steps_20 = getDataFramePipeline(20)super_20 = Pipeline(steps_20).

fit_transform(df)model_lgbm2 = LGBMRegressor(random_state=0)tune_lgbm =TimeSeriesGridSearch(model = model_lgbm2, cv = kf, param_grid=params,verbosity=False,scoring=rmsle)When the best tuning hyperparameters are at the edge of tuning parameters means that we must reconsider the range of hyperparameters and recalculate the model although we will not do it at this post.

Prediction vs.

Real SalesFig 9.

shows the prediction value versus the sales value for week 52.

It is seen that the model works well for the sales numbers up to 15; however, it forecasts poorly for the sales around 30.

As we discuss in Fig.

7, we might build different models for the different range of sales to overcome this problem and to have a more robust forecasting model although further modeling is beyond this post and this post already is so long.

Fig.

9) Prediction of sales vs.

real sales number.

It is seen that the model works properly for the low number of sales (less than 15); however, it does not work well for a large number of sales.

Therefore, this might be a good motivation to build two models for low and high sales items.

Finally, Fig.

10 shows all of our attempts to forecast the sales.

We started with a very simple assumption as the baseline and we tried to improve it by using different lags/diff, statistical transformation, and applying different machine learning algorithms to improve the forecasting.

The baseline error was 0.

516 and the tuned model error was 0.

3868 which means 25% reduction in the error.

Fig.

10) Our different models score are shown.

We could reduce the error of the baseline by 25%.

There are still many ways to improve the presented model, for example, dealing with products as categorical variables properly, more extensive feature engineer, tuning hyperparameters, using various machine learning algorithms and blending and stacking.

Conclusion:We built a time series forecasting pipeline to predict weekly sales transaction.

We started with a simple logical assumption as a baseline model; then, we could reduce the baseline error 25% by building a pipeline including a basic feature engineering, a statistical transformation and applying Random forest and LGBM and finally tuning them.

Besides, we discussed different time series cross-validation methods.

Moreover, we show how we can use Sklearn base classes to build a pipeline.

Coding:The complete code of this post can be found on my GitHub.

Code 1.

class ToSupervised(base.

BaseEstimator,base.

TransformerMixin): def __init__(self,col,groupCol,numLags,dropna=False): self.

col = col self.

groupCol = groupCol self.

numLags = numLags self.

dropna = dropna def fit(self,X,y=None): self.

X = X return self def transform(self,X): tmp = self.

X.

copy() for i in range(1,self.

numLags+1): tmp[str(i)+'_Week_Ago'+"_"+self.

col] = tmp.

groupby([self.

groupCol])[self.

col].

shift(i) if self.

dropna: tmp = tmp.

dropna() tmp = tmp.

reset_index(drop=True) return tmpCode 2.

class ToSupervisedDiff(base.

BaseEstimator,base.

TransformerMixin): def __init__(self,col,groupCol,numLags,dropna=False): self.

col = col self.

groupCol = groupCol self.

numLags = numLags self.

dropna = dropna def fit(self,X,y=None): self.

X = X return self def transform(self,X): tmp = self.

X.

copy() for i in range(1,self.

numLags+1): tmp[str(i)+'_Week_Ago_Diff_'+"_"+self.

col] = tmp.

groupby([self.

groupCol])[self.

col].

diff(i) if self.

dropna: tmp = tmp.

dropna() tmp = tmp.

reset_index(drop=True) return tmpCode 3.

from itertools import chainclass Kfold_time(object): def __init__(self,**options): self.

target = options.

pop('target', None) self.

date_col = options.

pop('date_col', None) self.

date_init = options.

pop('date_init', None) self.

date_final = options.

pop('date_final', None) if options: raise TypeError("Invalid parameters passed: %s" % str(options)) if ((self.

target==None )|(self.

date_col==None )| (self.

date_init==None )|(self.

date_final==None )): raise TypeError("Incomplete inputs") def _train_test_split_time(self,X): n_arrays = len(X) if n_arrays == 0: raise ValueError("At least one array required as input") for i in range(self.

date_init,self.

date_final): train = X[X[self.

date_col] < i] val = X[X[self.

date_col] == i] X_train, X_test = train.

drop([self.

target], axis=1), val.

drop([self.

target], axis=1) y_train, y_test = train[self.

target].

values, val[self.

target].

values yield X_train, X_test, y_train, y_test def split(self,X): cv_t = self.

_train_test_split_time(X) return chain(cv_t)Code 4.

class BaseEstimator(base.

BaseEstimator, base.

RegressorMixin): def __init__(self, predCol): """ As a base model we assume the number of sales last week and this week are the same Input: predCol: l-week ago sales """ self.

predCol = predCol def fit(self, X, y): return self def predict(self, X): prediction = X[self.

predCol].

values return prediction def score(self, X, y,scoring): prediction = self.

predict(X) error =scoring(y, prediction) return errorCode 5.

class TimeSeriesRegressor(base.

BaseEstimator, base.

RegressorMixin): def __init__(self,model,cv,scoring,verbosity=True): self.

model = model self.

cv = cv self.

verbosity = verbosity self.

scoring = scoring def fit(self,X,y=None): return self def predict(self,X=None): pred = {} for indx,fold in enumerate(self.

cv.

split(X)): X_train, X_test, y_train, y_test = fold self.

model.

fit(X_train, y_train) pred[str(indx)+'_fold'] = self.

model.

predict(X_test) prediction = pd.

DataFrame(pred) return prediction def score(self,X,y=None): errors = [] for indx,fold in enumerate(self.

cv.

split(X)): X_train, X_test, y_train, y_test = fold self.

model.

fit(X_train, y_train) prediction = self.

model.

predict(X_test) error = self.

scoring(y_test, prediction) errors.

append(error) if self.

verbosity: print("Fold: {}, Error: {:.

4f}".

format(indx,error)) if self.

verbosity: print('Total Error {:.

4f}'.

format(np.

mean(errors))) return errorsCode 6.

class TimeSeriesRegressorLog(base.

BaseEstimator, base.

RegressorMixin): def __init__(self,model,cv,scoring,verbosity=True): self.

model = model self.

cv = cv self.

verbosity = verbosity self.

scoring = scoring def fit(self,X,y=None): return self def predict(self,X=None): pred = {} for indx,fold in enumerate(self.

cv.

split(X)): X_train, X_test, y_train, y_test = fold self.

model.

fit(X_train, y_train) pred[str(indx)+'_fold'] = self.

model.

predict(X_test) prediction = pd.

DataFrame(pred) return prediction def score(self,X,y=None):#**options): errors = [] for indx,fold in enumerate(self.

cv.

split(X)): X_train, X_test, y_train, y_test = fold self.

model.

fit(X_train, np.

log1p(y_train)) prediction = np.

expm1(self.

model.

predict(X_test)) error = self.

scoring(y_test, prediction) errors.

append(error) if self.

verbosity: print("Fold: {}, Error: {:.

4f}".

format(indx,error)) if self.

verbosity: print('Total Error {:.

4f}'.

format(np.

mean(errors))) return errorsCode 7.

A:def getDataFramePipeline(i): steps = [(str(i)+'_step', ToSupervised('Sales','Product_Code',i))] for j in range(1,i+1): if i==j: pp = (str(j)+'_step_diff', ToSupervisedDiff(str(i)+'_Week_Ago_Sales', 'Product_Code',1,dropna=True)) steps.

append(pp) else: pp = (str(j)+'_step_diff', ToSupervisedDiff(str(i)+'_Week_Ago_Sales', 'Product_Code',1)) steps.

append(pp) return stepsB:from tqdm import tqdmdef stepsTune(X,model,num_steps,init=1): scores = [] for i in tqdm(range(init,num_steps+1)): steps = [] steps.

extend(getDataFramePipeline(i)) steps.

append(('predic_1',model)) super_ = Pipeline(steps).

fit(X) score_ = np.

mean(super_.

score(X)) scores.

append((i,score_)) return scoresCode 8.

A:from collections.

abc import Mapping, Sequence, Iterablefrom itertools import productfrom functools import partial, reduceimport operatorclass TimeGridBasic(base.

BaseEstimator, base.

RegressorMixin): def __init__(self,param_grid): if not isinstance(param_grid, (Mapping, Iterable)): raise TypeError('Parameter grid is not a dict or ' 'a list ({!r})'.

format(param_grid)) if isinstance(param_grid, Mapping): # wrap dictionary in a singleton list to support either dict # or list of dicts param_grid = [param_grid] if isinstance(param_grid, Mapping): # wrap dictionary in a singleton list to support either dict # or list of dicts param_grid = [param_grid] # check if all entries are dictionaries of lists for grid in param_grid: if not isinstance(grid, dict): raise TypeError('Parameter grid is not a ' 'dict ({!r})'.

format(grid)) for key in grid: if not isinstance(grid[key], Iterable): raise TypeError('Parameter grid value is not iterable ' '(key={!r}, value={!r})' .

format(key, grid[key])) self.

param_grid = param_grid def __iter__(self): """Iterate over the points in the grid.

Returns ——- params : iterator over dict of string to any Yields dictionaries mapping each estimator parameter to one of its allowed values.

""" for p in self.

param_grid: # Always sort the keys of a dictionary, for reproducibility items = sorted(p.

items()) if not items: yield {} else: keys, values = zip(*items) for v in product(*values): params = dict(zip(keys, v)) yield paramsB:class TimeSeriesGridSearch(TimeGridBasic,base.

BaseEstimator, base.

RegressorMixin): def __init__(self,**options): self.

model = options.

pop('model', None) self.

cv = options.

pop('cv', None) self.

verbosity = options.

pop('verbosity', False) self.

scoring = options.

pop('scoring', None) param_grid = options.

pop('param_grid', None) self.

param_grid = TimeGridBasic(param_grid) if options: raise TypeError("Invalid parameters passed: %s" % str(options)) if ((self.

model==None )| (self.

cv==None)): raise TypeError("Incomplete inputs") def fit(self,X,y=None): self.

X = X return self def _get_score(self,param): errors = [] for indx,fold in enumerate(self.

cv.

split(self.

X)): X_train, X_test, y_train, y_test = fold self.

model.

set_params(**param).

fit(X_train, y_train) prediction = self.

model.

predict(X_test) error = self.

scoring(y_test, prediction) errors.

append(error) if self.

verbosity: print("Fold: {}, Error: {:.

4f}".

format(indx,error)) if self.

verbosity: print('Total Error {:.

4f}'.

format(np.

mean(errors))) return errors def score(self): errors=[] get_param = [] for param in self.

param_grid: if self.

verbosity: print(param) errors.

append(np.

mean(self.

_get_score(param))) get_param.

append(param) self.

sorted_errors,self.

sorted_params = (list(t) for t in zip(*sorted(zip(errors,get_param)))) return self.

sorted_errors,self.

sorted_params def best_estimator(self,verbosity=False): if verbosity: print('error: {:.

4f}.'.

format(self.

sorted_errors[0])) print('Best params:') print(self.

sorted_params[0]) return self.

sorted_params[0].