Zero Downtime / High Availability with Catalyst & FastCGI external servers
Intro
Here's the idea - you simultaneously run 2 FastCgiExternalServer's - a production one & a staging one. The staging one you muck around with to your heart's content. Start/stop/restart it - whatever - it won't effect your production environment. At some point you want to promote the staging stuff into production. And of course you cannot have ANY downtime.
I'm going to use an application name of 'mt' for this example - so replace it (or not) with your own - same for 'example.com'
All of these examples run fine on CentOS
You're probably going to have to change some paths too
Get Started
So you need 5 files - 2 httpd.conf's, 2 start/stop/restart fastcgi scripts, & 1 switchover script
* The 2 httpd configuration files are a 'production one' - which you're running 99% of the time, and a 'temporary one' - that you use to gracefully promote your staging stuff into production.
* The 2 start/stop/restart scripts are used to independently start/stop/restart your fastcgi external servers.
* The 1 switchover script will actually do the promotion with zero downtime. When you're done promoting your production environment will exactly match your staging environment.
1. create 2 httpd.conf's - 'httpd_prod.conf' & 'httpd_temp.conf' - each with a staging & production virtual server:
1. http_prod.conf:
# the staging socket
FastCgiExternalServer /tmp/mt.stage -socket /tmp/mt.stage.socket
# the production socket
FastCgiExternalServer /tmp/mt.prod -socket /tmp/mt.prod.socket
2. http_temp.conf: (so named because this is a temporary state when you want to make staging = production)
Note the only diff is there's only 1 external server (the staging one) & the production virtual host now points to it
FastCgiExternalServer /tmp/mt.stage -socket /tmp/mt.stage.socket
2. You need 2 /etc/init.d (or whatever) scripts that start & stop your fastcgi servers: mt.prod & mt.stage - the differences are mt.prod uses /tmp/mt.prod & mt.stage uses /tmp/mt.stage for their sockets & they have different pid files (also I don't start as many processes for my staging daemons) - finally REALLY make sure PROD starts up before moving on.
1. mt.prod:
#!/bin/sh
APP_PATH=
case $1 in
start)
echo -n "Starting PROD MT: mt_fastcgi.pl"
cd $APP_PATH
script/mt_fastcgi.pl -l $FCGI_SOCKET_PATH -p $PID_PATH -d -n 5
echo
# make real sure it's started
PID=`cat $PID_PATH`
if [ -n "$PID" ]
then
echo "Started"
else
echo "Start failed - trying again"
unlink $FCGI_SOCKET_PATH
$0 start
fi
;;
stop)
echo -n "Stopping PROD MT: "
PID=`cat $PID_PATH`
if [ -n "$PID" ]
then
echo -n kill $PID
kill $PID
echo
unlink $FCGI_SOCKET_PATH
else
echo MT not running
fi
;;
restart|force-reload)
$0 stop
sleep 10
$0 start
;;
*)
echo "Usage: /etc/init.d/mt.prod { stop | start | restart }"
exit 1
;;
esac
2. mt.stage:
#!/bin/sh
APP_PATH=
case $1 in
start)
echo -n "Starting STAGE MT: mt_fastcgi.pl"
cd $APP_PATH
script/mt_fastcgi.pl -l $FCGI_SOCKET_PATH -p $PID_PATH -d
echo
;;
stop)
echo -n "Stopping STAGE MT: "
PID=`cat $PID_PATH`
if [ -n "$PID" ]
then
echo -n kill $PID
kill $PID
echo
unlink $FCGI_SOCKET_PATH
else
echo STAGE MT not running
fi
;;
restart|force-reload)
$0 stop
sleep 10
$0 start
;;
*)
echo "Usage: /etc/init.d/mt.stage { stop | start | restart }"
exit 1
;;
esac
3. The Switchover script - this will make staging -> production without any downtime. Note you'll probably have to change the HTTP_BASE_DIR, the path/method you use to gracefully restart apache, and the location of your production start/stop script:
#!/bin/sh
# CHANGE this probably
BASE_HTTP_DIR=/etc/httpd/conf
HTTPD_CONF=$BASE_HTTP_DIR/httpd.conf
# Maybe also change the way you gracefully restart apache
# Maybe also change the path to your mt.prod start/stop/restart script
link_to () {
# First symlink httpd.conf
unlink $HTTPD_CONF
ln -s $BASE_HTTP_DIR/$1 $HTTPD_CONF
if [ -e $HTTPD_CONF ]; then
echo "Symlinked $HTTPD_CONF to $BASE_HTTP_DIR/$1"
else
echo "$HTTPD_CONF doesn't exist - error symlinking - I'm outta here"
exit 1
fi
# Restart apache nicely - you may need to change this to however you do it
echo "Gracefully restarting apache..."
/etc/init.d/httpd graceful
}
# Make production = staging
echo "Switching over to temporary config..."
link_to httpd_temp.conf
# okay production is now running staging
# Now restart the production socket
echo "Restarting production socket..."
/etc/init.d/mt.prod restart
sleep 5
PID=`cat /tmp/mt.prod.pid`
RUNNING=`ps auxwww $PID | grep $PID | grep -v grep | grep -v ps`
# Make sure it's back up
if [ "$RUNNING" ]; then
echo "Production socket looking good: $RUNNING"
else
echo "Production socket didn't start up!!"
echo "Site running staging config/socket - be careful."
echo "Fix the problem & re-run $0 to get back to production config/socket."
exit $?
fi
# Now switch back over to the production httpd.conf
echo "Switching back to production config..."
link_to httpd_prod.conf
Putting It All Together
Start
1. First you start up both mt.prod & mt.stage so you've got sockets /tmp/mt.prod & /tmp/mt.stage
2. Symlink httpd.conf -> httpd_prod.conf
3. Start apache
Edit/Muck with staging
Now make changes & restart your staging fastcgi servers as necessary:
/etc/init.d/mt.stage restart
This won't effect production at all.
Go to http://stage.example.com & check out your changes
Promote
Okay now you're happy with staging & are ready to move the staging stuff into production.
Run the switchover script - this will promote the staging stuff to production without any downtime. It will:
1. Move httpd.conf symlink to point to http_temp.conf
2. restart apache gracefully
3. now your production server is using the 'staging' socket
4. restart your production socket /etc/init.d/mt.prod restart (or whatever)
5. Move httpd.conf symlink back to http_prod.conf
6. restart apache gracefully
Now production matches staging without any downtime - woohoo!
Rinse. Lather. Repeat