You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importopenaiBASE_URL="http://localhost:8007/v1"# port 8000 or 8005API_KEY=<my-key>openai_client=openai.OpenAI(
base_url=BASE_URL,
api_key=API_KEY
)
chat_template="{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"completion=openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Hello"}
],
temperature=0.0,
n=1,
seed=42,
max_tokens=2048,
extra_body={
"chat_template": chat_template
},
)
print(completion.choices[0].message.content)
The I got the error:
llm-vllm-dev | INFO 09-16 04:20:27 server.py:228] vLLM ZMQ RPC Server was interrupted.
llm-vllm-dev | Future exception was never retrieved
llm-vllm-dev | future: <Future finished exception=AttributeError("'_OpNamespace' '_C' object has no attribute 'ggml_dequantize'")>
llm-vllm-dev | Traceback (most recent call last):
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
llm-vllm-dev | async forrequest_outputin results_generator:
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
llm-vllm-dev | async foroutputin await self.add_request(
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
llm-vllm-dev | raise result
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 48, in _log_task_completion
llm-vllm-dev | return_value = task.result()
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 733, in run_engine_loop
llm-vllm-dev | result = task.result()
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 673, in engine_step
llm-vllm-dev | request_outputs = await self.engine.step_async(virtual_engine)
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 340, in step_async
llm-vllm-dev | outputs = await self.model_executor.execute_model_async(
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 314, in execute_model_async
llm-vllm-dev | output = await make_async(self.execute_model
llm-vllm-dev | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
llm-vllm-dev | result = self.fn(*self.args, **self.kwargs)
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 226, in execute_model
llm-vllm-dev | output = self.driver_method_invoker(self.driver_worker,
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 380, in _async_driver_method_invoker
llm-vllm-dev |return driver.execute_method(method, *args, **kwargs).get()
llm-vllm-dev | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
llm-vllm-dev | raise self.result.exception
llm-vllm-dev | AttributeError: '_OpNamespace''_C' object has no attribute 'ggml_dequantize'
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Quang-elec44
changed the title
[Bug]: vllm-cpu docker: AttributeError: '_OpNamespace' '_C' object has no attribute 'ggml_dequantize'
[Bug]: vllm-cpu docker gguf: AttributeError: '_OpNamespace' '_C' object has no attribute 'ggml_dequantize'
Sep 16, 2024
I don't have very much bandwidth to port the CPU kernel right now, especially the GGUF quantization performance on GPU is still under-optimized due to the out-of-date GPU kernel currently. :(
Any contributions to support GGUF quantization on CPU is welcome!
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
First, I followed this instruction and built my docker image.
Then I started my container with the below
docker-compose.yml
file.My running script:
The I got the error:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: