Skip to content
This repository was archived by the owner on Sep 18, 2024. It is now read-only.
This repository was archived by the owner on Sep 18, 2024. It is now read-only.
This issue has been moved to a discussionGo to the discussion

Dispatcher stream error, tuner may have crashed. #3405

@drxmy

Description

@drxmy

Environment:

  • NNI version:2.0
  • NNI mode (local|remote|pai):local
  • Client OS: win10
  • Server OS (for remote mode only):
  • Python version: 3.7.8
  • PyTorch/TensorFlow version: torch 1.7
  • Is conda/virtualenv/venv used?: conda
  • Is running in Docker?: no

Log message:

  • nnimanager.log:
    [2021-02-26 09:43:02] ERROR [ 'Dispatcher error: This socket has been ended by the other party' ]
    [2021-02-26 09:43:02] ERROR [ 'Error: Dispatcher stream error, tuner may have crashed.\n at EventEmitter.dispatcher.onError (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\nnimanager.js:533:32)\n at EventEmitter.emit (events.js:198:13)\n at Socket.IpcInterface.outgoingStream.on (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\ipcInterface.js:42:72)\n at Socket.emit (events.js:198:13)\n at Socket.writeAfterFIN [as write] (net.js:399:8)\n at IpcInterface.sendCommand (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\ipcInterface.js:49:38)\n at NNIManager.pingDispatcher (d:\anaconda\envs\py_1.7\lib\site-packages\nni_node\core\nnimanager.js:378:29)' ]
    [2021-02-26 09:43:02] INFO [ 'Change NNIManager status from: RUNNING to: ERROR' ]
    [2021-02-26 09:43:02] WARNING [ 'Commands jammed in buffer!' ]

  • dispatcher.log:
    [2021-02-26 09:42:04] INFO (GP_Tuner_AutoML/Thread-1) Generate paramageters:
    {'layers': 10, 'neurons': 20}
    [2021-02-26 09:42:51] INFO (GP_Tuner_AutoML/Thread-1) Received trial result.
    [2021-02-26 09:42:51] INFO (GP_Tuner_AutoML/Thread-1) value :-0.010340445765208602
    [2021-02-26 09:42:51] INFO (GP_Tuner_AutoML/Thread-1) parameter : {'layers': 10, 'neurons': 20}
    [2021-02-26 09:42:53] WARNING (medianstop_Assessor/Thread-2) trial_end: trial_job_id does not exist in running_history
    [2021-02-26 09:42:53] ERROR (nni.runtime.msg_dispatcher_base/Thread-1) 'str' object has no attribute 'decode'
    Traceback (most recent call last):
    File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\runtime\msg_dispatcher_base.py", line 88, in command_queue_worker
    self.process_command(command, data)
    File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\runtime\msg_dispatcher_base.py", line 147, in process_command
    command_handlerscommand
    File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\runtime\msg_dispatcher.py", line 101, in handle_request_trial_jobs
    params_list = self.tuner.generate_multiple_parameters(ids, st_callback=self.send_trial_callback)
    File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\tuner.py", line 132, in generate_multiple_parameters
    res = self.generate_parameters(parameter_id, **kwargs)
    File "D:\anaconda\envs\py_1.7\lib\site-packages\nni\algorithms\hpo\gp_tuner\gp_tuner.py", line 123, in generate_parameters
    self._gp.fit(self._space.params, self._space.target)
    File "D:\anaconda\envs\py_1.7\lib\site-packages\sklearn\gaussian_process_gpr.py", line 249, in fit
    bounds))
    File "D:\anaconda\envs\py_1.7\lib\site-packages\sklearn\gaussian_process_gpr.py", line 504, in _constrained_optimization
    _check_optimize_result("lbfgs", opt_res)
    File "D:\anaconda\envs\py_1.7\lib\site-packages\sklearn\utils\optimize.py", line 243, in _check_optimize_result
    ).format(solver, result.status, result.message.decode("latin1"))
    AttributeError: 'str' object has no attribute 'decode'
    [2021-02-26 09:42:57] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher exiting...
    [2021-02-26 09:42:59] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher terminiated

  • nnictl stdout and stderr:

The code worked a month ago. But today it did not. I want to do 100 trials. It will stop in the middle and give me the erroe 'Dispatcher stream error, tuner may have crashed'.

Activity

self-assigned this
on Feb 26, 2021
szhang963

szhang963 commented on Feb 27, 2021

@szhang963

I also meet the error,but just a few trails was run successfully for me .


ERROR [ 'Error: Dispatcher stream error, tuner may have crashed.\n    at EventEmitter.dispatcher.onError (/home/xxx/anaconda3/lib/python3.6/site-packages/nni_node/core/nnimanager.js:533:32)\n    at EventEmitter.emit (events.js:198:13)\n    at Socket.IpcInterface.outgoingStream.on (/home/xxx/anaconda3/lib/python3.6/site-packages/nni_node/core/ipcInterface.js:42:72)\n    at Socket.emit (events.js:198:13)\n    at Socket.writeAfterFIN [as write] (net.js:399:8)\n    at IpcInterface.sendCommand (/home/xxx/anaconda3/lib/python3.6/site-packages/nni_node/core/ipcInterface.js:49:38)\n    at NNIManager.onTrialJobMetrics (/home/xxx/anaconda3/lib/python3.6/site-packages/nni_node/core/nnimanager.js:550:29)' ]

I want to know what have caused it?
Was it the cause that I fix the code when the nni is running?

J-shang

J-shang commented on Mar 1, 2021

@J-shang
Contributor

@drxmy Hi, could you show your sklearn and scipy version, try to update sklearn>=0.24.1 or downgrade scipy<=1.5.3 may solve this.

https://stackoverflow.com/questions/65682019/attributeerror-str-object-has-no-attribute-decode-in-fitting-logistic-regre
scikit-optimize/scikit-optimize#981

J-shang

J-shang commented on Mar 1, 2021

@J-shang
Contributor

@zs963048949 Hi, could you show more information, like dispatcher.log and nnimanager.log under ~/nni-experiments/EXPERIMENT_ID/log. Most NNI training services and tuners do not support modify user code during running, so if you need to fix the code, better to run a new experiment.

drxmy

drxmy commented on Mar 1, 2021

@drxmy
Author

@J-shang My scipy is 1.6.1 and sklearn is 0.24.1. Thank you for replying! I will try to downgrade scipy.

locked and limited conversation to collaborators on Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @J-shang@szhang963@scarlett2018@drxmy@kvartet

      Issue actions

        Dispatcher stream error, tuner may have crashed. · Issue #3405 · microsoft/nni