老规矩,先上源代码:
from sklearn.model_selection import GridSearchCV param_grid =[ {'n_eatimatiors':[3,10,30],'max_features':[2,4,6,8]}, {'bootstrap':[False],'n_estimators':[3,10],'max_features':[2,3,4]}, ] forest_reg=RandomForestRegressor(random_state=42) grid_search=GridSearchCV(forest_reg,param_grid,cv=5, scoring='neg_mean_squared_error',return_train_score=True) grid_search.fit(housing_prepared,housing_labels)介绍下这段代码的应用方向和实现效果, 模型调参和网格搜索 1.手动调整超参数,找到很好的组合时很困难的 2.使用GridSearchCV替你进行搜索,告诉GridSearchCV,进行试验的超参数是什么,以及需要尝试的值,他会使用交叉验证评估所有超参数的可能组合
上述代码运行后的效果:
ValueError Traceback (most recent call last) <ipython-input-353-54613827b5ad> in <module> 8 grid_search=GridSearchCV(forest_reg,param_grid,cv=5, 9 scoring='neg_mean_squared_error',return_train_score=True) ---> 10 grid_search.fit(housing_prepared,housing_labels) d:\python3.8.5\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 70 FutureWarning) 71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 72 return f(**kwargs) 73 return inner_f 74 d:\python3.8.5\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params) 734 return results 735 --> 736 self._run_search(evaluate_candidates) 737 738 # For multi-metric evaluation, store the best_index_, best_params_ and d:\python3.8.5\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates) 1186 def _run_search(self, evaluate_candidates): 1187 """Search all candidates in param_grid""" -> 1188 evaluate_candidates(ParameterGrid(self.param_grid)) 1189 1190 d:\python3.8.5\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params) 706 n_splits, n_candidates, n_candidates * n_splits)) 707 --> 708 out = parallel(delayed(_fit_and_score)(clone(base_estimator), 709 X, y, 710 train=train, test=test, d:\python3.8.5\lib\site-packages\joblib\parallel.py in __call__(self, iterable) 1046 # remaining jobs. 1047 self._iterating = False -> 1048 if self.dispatch_one_batch(iterator): 1049 self._iterating = self._original_iterator is not None 1050 d:\python3.8.5\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator) 864 return False 865 else: --> 866 self._dispatch(tasks) 867 return True 868 d:\python3.8.5\lib\site-packages\joblib\parallel.py in _dispatch(self, batch) 782 with self._lock: 783 job_idx = len(self._jobs) --> 784 job = self._backend.apply_async(batch, callback=cb) 785 # A job can complete so quickly than its callback is 786 # called before we get here, causing self._jobs to d:\python3.8.5\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback) 206 def apply_async(self, func, callback=None): 207 """Schedule a func to be run""" --> 208 result = ImmediateResult(func) 209 if callback: 210 callback(result) d:\python3.8.5\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch) 570 # Don't delay the application, to avoid keeping the input 571 # arguments in memory --> 572 self.results = batch() 573 574 def get(self): d:\python3.8.5\lib\site-packages\joblib\parallel.py in __call__(self) 260 # change the default number of processes to -1 261 with parallel_backend(self._backend, n_jobs=self._n_jobs): --> 262 return [func(*args, **kwargs) 263 for func, args, kwargs in self.items] 264 d:\python3.8.5\lib\site-packages\joblib\parallel.py in <listcomp>(.0) 260 # change the default number of processes to -1 261 with parallel_backend(self._backend, n_jobs=self._n_jobs): --> 262 return [func(*args, **kwargs) 263 for func, args, kwargs in self.items] 264 d:\python3.8.5\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score) 518 cloned_parameters[k] = clone(v, safe=False) 519 --> 520 estimator = estimator.set_params(**cloned_parameters) 521 522 start_time = time.time() d:\python3.8.5\lib\site-packages\sklearn\base.py in set_params(self, **params) 247 key, delim, sub_key = key.partition('__') 248 if key not in valid_params: --> 249 raise ValueError('Invalid parameter %s for estimator %s. ' 250 'Check the list of available parameters ' 251 'with `estimator.get_params().keys()`.' % ValueError: Invalid parameter n_eatimatiors for estimator RandomForestRegressor(max_features=2, random_state=42). Check the list of available parameters with `estimator.get_params().keys()`错误的提示很长,核心是看最后几行,看箭头指向的249和最终的错误提示,
247 key, delim, sub_key = key.partition('__') 248 if key not in valid_params: --> 249 raise ValueError('Invalid parameter %s for estimator %s. ' 250 'Check the list of available parameters ' 251 'with `estimator.get_params().keys()`.' % ValueError: Invalid parameter n_eatimatiors for estimator RandomForestRegressor(max_features=2, random_state=42). Check the list of available parameters with `estimator.get_params().keys()`.仔细分析这段代码,先定位"n_eatimatiors “, 原因是"Invalid parameter”, 结论是:“n_eatimatiors"是"Invalid parameter”, 于是接下来快速找到"n_eatimatiors",先检查书写的正确性,
param_grid =[ {'n_eatimatiors':[3,10,30],'max_features':[2,4,6,8]}, {'bootstrap':[False],'n_estimators':[3,10],'max_features':[2,3,4]}, ]确实是书写有问题,“n_eatimatiors"更新为"n_estimators”,
param_grid =[ {'n_estimators':[3,10,30],'max_features':[2,4,6,8]}, {'bootstrap':[False],'n_estimators':[3,10],'max_features':[2,3,4]}, ]重新运行OK。
GridSearchCV(cv=5, estimator=RandomForestRegressor(random_state=42), param_grid=[{'max_features': [2, 4, 6, 8], 'n_estimators': [3, 10, 30]}, {'bootstrap': [False], 'max_features': [2, 3, 4], 'n_estimators': [3, 10]}], return_train_score=True, scoring='neg_mean_squared_error')心得体会:写代码的过程中会经常遇到各种各样的问题,学会分析和解决问题至关重要,多去训练这方面的能力。