Seeing an issue in production where this signal handler is being handled
in a way that causes the eventlet hub to *not* be able to properly switch back
to its runloop properly.
My hunch is that this is happening due to the use of the threadpool and signal
handlers being forced to execute on the primary thread. That said, no repro
case whatsoever. HEISENBUG!
Change-Id: Id303e3755d89d7e2aab06dde17719fef843b01a2
In production environments, I've seen a some threading.Lock acquire calls
killing the controller. I believe this patch should resolve that issue:
A child intends to die, starting replacement before it dies
Controller.spawn_children(number=1)
Traceback (most recent call last):
File "/usr/local/bin/spawning", line 8, in <module>
load_entry_point('Spawning==0.9.5', 'console_scripts', 'spawning')()
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 501, in main
start_controller(sock, factory, factory_args)
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 278, in start_controller
c.run()
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 236, in run
self.runloop()
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 158, in runloop
eventlet.sleep(0.1)
File "/usr/local/lib/python2.6/site-packages/eventlet/greenthread.py", line 30, in sleep
hub.switch()
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 177, in switch
return self.greenlet.switch()
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 226, in run
self.wait(sleep_time)
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/poll.py", line 84, in wait
presult = self.do_poll(seconds)
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/epolls.py", line 55, in do_poll
return self.poll.poll(seconds)
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 212, in handle_deadlychild
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/poll.py", line 84, in wait
presult = self.do_poll(seconds)
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/epolls.py", line 55, in do_poll
return self.poll.poll(seconds)
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 212, in handle_deadlychild
self.spawn_children(number=1)
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 121, in spawn_children
self.log.debug('Controller.spawn_children(number=%d)' % number)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1044, in debug
self._log(DEBUG, msg, args, **kwargs)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1173, in _log
self.handle(record)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1183, in handle
self.callHandlers(record)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1220, in callHandlers
hdlr.handle(record)
File "/usr/local/lib/python2.6/logging/__init__.py", line 681, in handle
self.release()
File "/usr/local/lib/python2.6/logging/__init__.py", line 635, in release
self.lock.release()
File "/usr/local/lib/python2.6/threading.py", line 142, in release
self.__block.release()
File "/usr/local/lib/python2.6/site-packages/eventlet/semaphore.py", line 80, in release
def release(self, blocking=True):
File "/usr/local/lib/python2.6/site-packages/spawning/spawning_controller.py", line 211, in handle_deadlychild
self.log.debug('A child intends to die, starting replacement before it dies')
File "/usr/local/lib/python2.6/logging/__init__.py", line 1044, in debug
self._log(DEBUG, msg, args, **kwargs)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1173, in _log
self.handle(record)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1183, in handle
self.callHandlers(record)
File "/usr/local/lib/python2.6/logging/__init__.py", line 1220, in callHandlers
hdlr.handle(record)
File "/usr/local/lib/python2.6/logging/__init__.py", line 677, in handle
self.acquire()
File "/usr/local/lib/python2.6/logging/__init__.py", line 628, in acquire
self.lock.acquire()
File "/usr/local/lib/python2.6/threading.py", line 123, in acquire
rc = self.__block.acquire(blocking)
File "/usr/local/lib/python2.6/site-packages/eventlet/semaphore.py", line 71, in acquire
hubs.get_hub().switch()
File "/usr/local/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 162, in switch
assert cur is not self.greenlet, 'Cannot switch to MAINLOOP from MAINLOOP'
AssertionError: Cannot switch to MAINLOOP from MAINLOOP
Change-Id: I18a63313279963ba51a18ee221137660be473c8f
The bug was caused by the os.close() in kill_children, which causes the controller to reuse pipe filenos which causes problems when the children take a while to close. The solution is to only close the pipe when the child actually exits (which is already in the code).
This will prevent the ability to use eventlet.tpool or eventlet.db_pool later
on
Thanks to rdw@ for pointing out the issue
Change-Id: I1013dd3df1516dc99db37d8e54ff69a9a964bd2d
During kill_children() the parent process can receive SIGUSR1 from the
child process it just terminated. This triggers it to launch a new child
which modifies self.child_pipes in the middle of iteration.
Signed-off-by: R. Tyler Ballance <tyler@monkeypox.org>
The `watch_memory` function is no longer referenced, moved the controller
code ou into its own function to reduce the indent levels inside of `main()`
Change-Id: Ice650dec754290500219c3cc3bf0997aa5c22fa3