Ok, so I came across a Schroedinbug this morning. I added a second server a few weeks ago to run Jobs via DelayedJob as they were eating up too many resources for the Passenger / DB / Sphinx server. Today is the first day that I noticed that the jobs were not able to do searches.
>> User.search 'asdf' Riddle::ConnectionError: Connection to 127.0.0.1 on 9312 failed. Connection refused - connect(2)
Problems
- iptables was blocking access from the jobs server to the sphinx port on the search server
- sphinx was listening on 127.0.0.1 and not 0.0.0.0
- thinking-sphinx was configured to connect to 127.0.0.1 on all servers
Resolution
- Setup /etc/hosts files on both servers to point the hostname alias “sphinx” to the app/search/db server (this way i don’t need different config files for the 2 servers)
- ensure ThinkingSphinx config is updated for this hostname alias (rather than going to localhost)
- Update your sphinx.conf (I roll my own sphinx.conf rather than let ThinkingSphinx generate one) to listen on the proper IP
- Ensure your iptables allow connections from jobs server to search server
** Run on search server as root to make sure all traffic is accepted from jobs server **
iptables -A INPUT -s 10.1.2.3 -j ACCEPT
I used to open just specific ports, but got tired of it, so just leave it wide open for server to server access. I also did the reciprocal command on the jobs server.
config/sphinx.yml for Thinking Sphinx
production: port: 9312 address: sphinx
sphinx.conf on Search Server
searchd { # listen = 127.0.0.1:9312 # Remove me. Sphinx's default is: 0.0.0.0:9132 }
Result
Schroedinbug is gone. The cat died. No wait, or did it live? Does the cat represent the bug or the system working as it should? Arg, another problem to be solved today.