2014년 11월 27일 목요일

갑자기 어느날 nova-compute가 libvirt에 연결할 수 없다면..

반나절 삽질한거 같다..


[-] Starting compute node (version 2014.1.3)
[-] Connecting to libvirt: qemu:///system _get_new_connection /opt/openstack/src/nova/nova/virt/libvirt/driver.py:671
[req-b01c4d29-a158-4e11-a076-74a37019642d None None] Cannot update service status on host: compute000,due to an unexpected exception.
Traceback (most recent call last):
  File "/opt/openstack/src/nova/nova/virt/libvirt/driver.py", line 2841, in _set_host_enabled
    service = service_obj.Service.get_by_compute_host(ctx, CONF.host)
  File "/opt/openstack/src/nova/nova/objects/base.py", line 110, in wrapper
    args, kwargs)
  File "/opt/openstack/src/nova/nova/conductor/rpcapi.py", line 425, in object_class_action
    objver=objver, args=args, kwargs=kwargs)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
    wait_for_reply=True, timeout=timeout)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
    timeout=timeout)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
    return self._send(target, ctxt, message, wait_for_reply, timeout)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
    result = self._waiter.wait(msg_id, timeout)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
    reply, ending = self._poll_connection(msg_id, timeout)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
    % msg_id)
MessagingTimeout: Timed out waiting for a reply to message ID 5dcbc226dfd94309aef2e71db27dd9e9

[-] Registering for lifecycle events  _get_new_connection /opt/openstack/src/nova/nova/virt/libvirt/driver.py:687
[-] URI qemu:///system does not support events: Cannot write data: Broken pipe
[-] Registering for connection events:  _get_new_connection /opt/openstack/src/nova/nova/virt/libvirt/driver.py:699

[-] internal error: client socket is closed
Traceback (most recent call last):
  File "/opt/openstack/src/nova/nova/openstack/common/threadgroup.py", line 117, in wait
    x.wait()
  File "/opt/openstack/src/nova/nova/openstack/common/threadgroup.py", line 49, in wait
    return self.thread.wait()
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 168, in wait
    return self._exit_event.wait()
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/event.py", line 116, in wait
    return hubs.get_hub().switch()
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 187, in switch
    return self.greenlet.switch()
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 194, in main
    result = function(*args, **kwargs)
  File "/opt/openstack/src/nova/nova/openstack/common/service.py", line 483, in run_service
    service.start()
  File "/opt/openstack/src/nova/nova/service.py", line 163, in start
    self.manager.init_host()
  File "/opt/openstack/src/nova/nova/compute/manager.py", line 1029, in init_host
    self.driver.init_host(host=self.host)
  File "/opt/openstack/src/nova/nova/virt/libvirt/driver.py", line 657, in init_host
    self._do_quality_warnings()
  File "/opt/openstack/src/nova/nova/virt/libvirt/driver.py", line 640, in _do_quality_warnings
    caps = self.get_host_capabilities()
  File "/opt/openstack/src/nova/nova/virt/libvirt/driver.py", line 2877, in get_host_capabilities
    xmlstr = self._conn.getCapabilities()
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/tpool.py", line 179, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/tpool.py", line 139, in proxy_call
    rv = execute(f,*args,**kwargs)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/eventlet/tpool.py", line 77, in tworker
    rv = meth(*args,**kwargs)
  File "/opt/openstack/venv/local/lib/python2.7/site-packages/libvirt.py", line 3303, in getCapabilities
    if ret is None: raise libvirtError ('virConnectGetCapabilities() failed', conn=self)
libvirtError: internal error: client socket is closed


어느 순간엔가 지옥같이 나타난 에러..
마치 보면 libvirt 컨넥션이 실패해서 나타나는 에러 인것 같아 보인다..
아래 3 에러가 순차적으로 발생해서 안되는 것 같지만
1. libvirt 실패, 2. amqp 연결 실패, 3. libvirt 연결실패로 특정함수 호출 실패

실제로 1,3 libvirt 실패는 무시해도 되고 2. amqp 연결 실패가 사실 문제의 핵심이다..
2번이야 1,3번 문제 잡고 해결해야지 라고 생각했다가 하루종일 삽질했다..
결국 2번 amqp 연결을 nova-conductor에서 소비하지 못하고 있으면 이런 문제가 발생할 수있다..(내 경우엔 nova-conductor가 :) 임에도 불구하고..)
결국 nova-conductor 를 사용하는 구성이면 이에 대한 연결을 리셋하기 위해 nova-conductor를 리스타트 하거나 rabbitmq를 리스타트 하면 된다..

service nova-conductor restart

진짜 에러메세지만 보면 예측하기가 어려운 에러 인것 같다..

댓글 없음:

댓글 쓰기