-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Dispatcher stream error, tuner may have crashed. #3405
Description
Environment:
- NNI version:2.0
- NNI mode (local|remote|pai):local
- Client OS: win10
- Server OS (for remote mode only):
- Python version: 3.7.8
- PyTorch/TensorFlow version: torch 1.7
- Is conda/virtualenv/venv used?: conda
- Is running in Docker?: no
Log message:
-
nnimanager.log:
[2021-02-26 09:43:02] ERROR [ 'Dispatcher error: This socket has been ended by the other party' ]
[2021-02-26 09:43:02] ERROR [ 'Error: Dispatcher stream error, tuner may have crashed.\n at EventEmitter.dispatcher.onError (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\nnimanager.js:533:32)\n at EventEmitter.emit (events.js:198:13)\n at Socket.IpcInterface.outgoingStream.on (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\ipcInterface.js:42:72)\n at Socket.emit (events.js:198:13)\n at Socket.writeAfterFIN [as write] (net.js:399:8)\n at IpcInterface.sendCommand (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\ipcInterface.js:49:38)\n at NNIManager.pingDispatcher (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\nnimanager.js:378:29)' ]
[2021-02-26 09:43:02] INFO [ 'Change NNIManager status from: RUNNING to: ERROR' ]
[2021-02-26 09:43:02] WARNING [ 'Commands jammed in buffer!' ] -
dispatcher.log:
[2021-02-26 09:42:04] INFO (GP_Tuner_AutoML/Thread-1) Generate paramageters:
{'layers': 10, 'neurons': 20}
[2021-02-26 09:42:51] INFO (GP_Tuner_AutoML/Thread-1) Received trial result.
[2021-02-26 09:42:51] INFO (GP_Tuner_AutoML/Thread-1) value :-0.010340445765208602
[2021-02-26 09:42:51] INFO (GP_Tuner_AutoML/Thread-1) parameter : {'layers': 10, 'neurons': 20}
[2021-02-26 09:42:53] WARNING (medianstop_Assessor/Thread-2) trial_end: trial_job_id does not exist in running_history
[2021-02-26 09:42:53] ERROR (nni.runtime.msg_dispatcher_base/Thread-1) 'str' object has no attribute 'decode'
Traceback (most recent call last):
File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\runtime\msg_dispatcher_base.py", line 88, in command_queue_worker
self.process_command(command, data)
File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\runtime\msg_dispatcher_base.py", line 147, in process_command
command_handlerscommand
File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\runtime\msg_dispatcher.py", line 101, in handle_request_trial_jobs
params_list = self.tuner.generate_multiple_parameters(ids, st_callback=self.send_trial_callback)
File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\tuner.py", line 132, in generate_multiple_parameters
res = self.generate_parameters(parameter_id, **kwargs)
File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\algorithms\hpo\gp_tuner\gp_tuner.py", line 123, in generate_parameters
self._gp.fit(self._space.params, self._space.target)
File "D:\anaconda\envs\py_1.7\lib\site-packages\sklearn\gaussian_process_gpr.py", line 249, in fit
bounds))
File "D:\anaconda\envs\py_1.7\lib\site-packages\sklearn\gaussian_process_gpr.py", line 504, in _constrained_optimization
_check_optimize_result("lbfgs", opt_res)
File "D:\anaconda\envs\py_1.7\lib\site-packages\sklearn\utils\optimize.py", line 243, in _check_optimize_result
).format(solver, result.status, result.message.decode("latin1"))
AttributeError: 'str' object has no attribute 'decode'
[2021-02-26 09:42:57] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher exiting...
[2021-02-26 09:42:59] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher terminiated -
nnictl stdout and stderr:
The code worked a month ago. But today it did not. I want to do 100 trials. It will stop in the middle and give me the erroe 'Dispatcher stream error, tuner may have crashed'.
Activity
szhang963 commentedon Feb 27, 2021
I also meet the error,but just a few trails was run successfully for me .
I want to know what have caused it?
Was it the cause that I fix the code when the nni is running?
J-shang commentedon Mar 1, 2021
@drxmy Hi, could you show your sklearn and scipy version, try to update
sklearn>=0.24.1
or downgradescipy<=1.5.3
may solve this.https://stackoverflow.com/questions/65682019/attributeerror-str-object-has-no-attribute-decode-in-fitting-logistic-regre
scikit-optimize/scikit-optimize#981
J-shang commentedon Mar 1, 2021
@zs963048949 Hi, could you show more information, like
dispatcher.log
andnnimanager.log
under~/nni-experiments/EXPERIMENT_ID/log
. Most NNI training services and tuners do not support modify user code during running, so if you need to fix the code, better to run a new experiment.drxmy commentedon Mar 1, 2021
@J-shang My scipy is 1.6.1 and sklearn is 0.24.1. Thank you for replying! I will try to downgrade scipy.