Wednesday, March 03, 2010

Automated deployment systems: push vs. pull

I've been immersed in the world of automated deployment systems for quite a while. Because I like Python, I've been using Fabric, but I also dabbled in Puppet. When people are asked about alternatives to Puppet in the Python world, many mention Fabric, but in fact these two systems are very different. Their main difference is the topic of this blog post.

Fabric is what I consider a 'push' automated deployment system: you install Fabric on a server, and from there you push deployments by running remote commands via ssh on a set of servers. In the Ruby world, an example of a push system is Capistrano.

The main advantages of a 'push' system are:
  • control: everything is synchronous, and under your control. You can see right away is something went wrong, and you can correct it immediately.
  • simplicity: in the case of Fabric, a 'fabfile' is just a collection of Python functions that copy files over to a remote server and execute commands over ssh on that server; it's all very easy to set up and run
The main disadvantages of a 'push' system are:
  • lack of full automation: it's not usually possible to boot a server and have it configure itself without some sort of client/server protocol which push systems don't generally support (see 'pull' systems below for that)
  • lack of scalability: when you're dealing with hundreds of servers, a push system starts showing its limits, unless it makes heavy use of threading or multi-processing
Puppet is what I consider a 'pull' automated deployment system (actually to be more precise, it is a configuration management system). In such a system, you have a server which acts as a master, and clients which contact the master to find out what they need to do, thus pulling their configuration information from the master. In Puppet, configuration files are called manifests. They are written in a specific language and they are declarative, i.e. they tell each client what to do, not how to do it. The Puppet client software running on each server knows how to interpret the manifest files and how to translate them into actions specific to the operating system of that server. For example, you specify in your manifest file that you want a user created and you don't need to say 'run the adduser command on server X'. Other examples of 'pull' deployment/configuration management systems are bcfg2 (Python),Chef (Ruby) and slack (Perl). A newcomer in the Python world is a port of Chef called kokki (it looks like it's very much in its infancy still, but I hope the author will continue to actively develop it).

The main advantages of a 'pull' system are:
  • full automation capabilities: it is possible, and indeed advisable, to fully automate the configuration of a newly booted server using a 'pull' deployment system (for details on how I've done it with Puppet, see this post)
  • increased scalability: in a 'pull' system, clients contact the server independently of each other, so the system as a whole is more scalable than a 'push' system
The main disadvantages of a 'pull' system are:
  • proprietary configuration management language: with the notable exception of Chef, which uses pure Ruby for its configuration 'recipes', most other pull system use their own proprietary way of specifying the configuration to be deployed (Puppet's language looks like a cross between Perl and Ruby, while bcfg2 uses...gasp...XML); this turns out to be a pretty big drawback, because if you're not using the system on a daily basis, you're guaranteed to forget it (as happened to me with Puppet)
  • scalability is still an issue: unless you deploy several master servers and keep them in sync, that one master will start getting swamped as you add more and more clients and thus will become your bottleneck
My particular preference is to use a 'pull' system for the initial configuration of a server, including all the packages necessary to deploy my application (for example tornado). For the actual application code deployment, I prefer to use a 'push' system, because it gives me more control over how exactly I do the deployment. I can take a server out of the load balancer, deploy, test, then put it back, rinse and repeat.

In discussions with Holger Krekel at PyCon, I realized that execnet might be a good replacement for Fabric for my needs. It already provides remote command execution via ssh, and an rsync-like file transfer protocol. All it needs is a small library of functions on top to do common system administration tasks such as running commands as sudo, etc. I also want to look into kokki as a replacement for Puppet in my deployment architecture.

A parting thought: my colleague Dan Mesh suggested using a queuing mechanism for the client-server protocol in a 'pull' system. In fact, I am becoming more and more convinced that as far as scalability is concerned, when in doubt, use a queuing mechanism. In this deployment architecture, the master would post tasks to be done by a specific client to a central queue. The client would check the queue periodically for a task assigned to it, would execute it then would send a report back to the server when done. Of course, you need to worry about authentication in this scenario, but it seems that it would solve a lot of the scalability issues that both push and pull systems exhibit. Who knows, we may build it at Evite and open source it...so stay tuned ;-)

22 comments:

Alex said...

Good compare/contrast between the two different approaches. I've actually separated the two categories of tools not by "push" vs "pull" but rather orchestration vs configuration management. Orchestration provides a centralized coordination of multiple steps in the distributed environment. Configuration management drives each host to a particular state based on the specification (be it proprietary domain-specific or general purpose language).

Dougal said...

I must admit I am not quite following your push/pull description.

You seem to suggest with fabric you can't bootstrap a system, but this is exactly what I am doing. Using rackspace cloud I've been deploying Django sites in a matter of minutes after first getting access to the fresh VM.

"you install Fabric on a server"

I don't have Fabric on my server. Only on my desktop.

Am I doing something horribly wrong or have I just miss-followed your post?

Grig Gheorghiu said...

Dougal -- thanks for the comment. When I said that you can't bootstrap a system with Fabric, I was referring to a fully automated bootstrapping/application deployment scenario, where you just boot up a VM and have the boot mechanism configure it so it contacts a master server and 'pulls' its configuration information from there, then proceeds to configure itself with no further intervention from you. Note that you didn't run any commands on it, other than booting it up (for more details, see http://agiletesting.blogspot.com/2009/09/bootstrapping-ec2-images-as-puppet.html)

In your scenario, you boot up a VM, then from your desktop you 'push' commands to it via Fabric, telling the VM to install the appropriate packages, then your own application. This works well up to a point, and I don't recommend changing this procedure if it works for you. However, it starts to break down when you're dealing with hundreds of VMs.

So at the end of the day it's a question of scale, which is why the word 'scalability' appears so often in my blog post.

In any case, I am a big fan of Fabric myself, so I'm glad it works well for you too.

Dougal said...

Great - that cleared it up. Thanks.

I think I may have to start experimenting with puppet but fabric is indeed working for my small deployments at the moment.

thanks for the heads up.

Unknown said...

Nice write-up.

I actually think these push systems are nice complements to Puppet et al - use them for ad-hoc management when necessary, and for workflow-based tasks that Puppet doesn't (yet) support, and use Puppet for the core configurations.

For the record, Puppet can be used in more like a push mode, but it's still not ad-hoc - you can push the code and trigger the run manually, but it's still going to be model-driven and tend to be a comprehensive configuration.

For me the bigger question is are you making an kind of one-time change, like triggering a deployment, or are you declaring the overall state of the system? Eventually Puppet will excel at the deployment triggers, too, but for now it's better to use its declarative tools in combination with an ad-hoc tool like Fabric or capistrano.

--Luke (founder of Puppet project)

Grig Gheorghiu said...

Luke -- thanks for the comment, I fully agree with you that push and pull tools can be successfully used in a complementary fashion. The Puppet + Fabric combo has been working well for me. As I said, a gripe I have w/ Puppet is that its configuration language is proprietary and easy to forget, unless you use it on a daily basis.

Unknown said...

http://www.infrastructures.org/papers/bootstrap/bootstrap.html

The canonical operations push/pull paper.

Grig Gheorghiu said...

Devdas -- very interesting, didn't know about that paper, but makes a lot of sense ;-) Thanks for the pointer.

Ronald said...

Hi,

I did a bit of research into Config management systems, and while most of them are pull, I did come across Smartfrog from HP research which wa a peer to peer push/pull which looked very interesting.

I use Fabric for most deployment functions but it falls short on real config management, currently I use buildout, due to a fairly small numebr of servers, but for scaling up I'll certainly look at smartfrog or possibly puppet again.

Ian Kallen said...

It seems like comparing Fabric to Capistrano is more apt than comparing it to Puppet. Deployment tools like Fabric and Capistrano are procedure oriented, whereas configuration management tools are more "state achievement" oriented. Yes, states are achieved by running procedures but the configuration management system should encapsulate them. I think ultimately you can have configuration management replace procedures; chef-deploy is a good example because it allows you to regard the code revision deployed as part of the configuration state.

I like what EngineYard has done (apparently RightScale does too, they copied it), they use a message system (nanite) with Chef to tell the edges they need to update. That may make the server vulnerable to the thundering herd problem that all pull systems are but it allows one to "push" in a scale-free manner. Multiple hosts simultaneously updating will swamp the configuration management system but I think your point about putting a queue in to moderate a scaled up environment alleviate that is good.

Grig Gheorghiu said...

Ian -- thanks for the comment. I did say that Fabric is in the same category as Capistrano, your 'procedure oriented' category, my 'push' category. You're making good points. Thanks also for the link to nanite. It seems similar to what I had in mind when I was referring to a queuing mechanism, but at the same time it also seems very complicated. As they say themselves, 'it has a lot of moving parts'. I would aim for something simpler.

Anonymous said...

If you would like to see a middleware centered admin system have a look at marionette-collective.org

It's developed in ruby but even just to see one such possible design you might find it interesting. These 2 types of system compliments each other very well as you describe.

Doug Lane said...

Would you categorize Cfengine as "push" or "pull"?

Grig Gheorghiu said...

Doug -- I consider cfengine a 'pull' system because each cf-agent contacts cf-serverd every N minutes to inquire about policy changes etc.

Here's fragment from http://prefetch.net/blog/index.php/2010/07/02/cfengine-3-tutorial-part-1-system-architecture/

6. cf-agent contacts cf-serverd running on Master Policy Server(s) and pulls updated policies / configs / etc via encrypted link. This happens via execution of failsafe.cf and update.cf <—— pull from Master Policy Servers. **** Clients pull. Servers don’t “push”. Changes are done on the client opportunistically. If the network is down, nothing happens on the clients. The next time the client can contact the Master Policy Server, the change is executed. *****

Grig Gheorghiu said...

Volcane -- I am aware of mcollective, it's a bit too complicated and proprietary for my taste. But I am following your project with interest and I am definitely getting new ideas from it, so great job with it!

Anonymous said...

Your queue-based system sounds an awful lot like 'mcollective'

http://marionette-collective.org/

Anonymous said...

@grig can you explain what you mean with proprietary wrt mcollective? That's a first as far as criticism goes :)

Grig Gheorghiu said...

R.I.Pienaar don't take my criticism too hard ;-) I was referring most of all to the RPC-based language that I'd have to learn if I were to write new clients and agents:

http://marionette-collective.org/simplerpc/

In general, learning somebody's framework's language turns me off, because my brain has limited capacity. I find that the more levels of indirection I need to learn, the harder it gets to remember everything I need for my job....

That's why I prefer Chef to Puppet, because at least I know everything is pure Ruby as opposed to somebody's 'proprietary' DSL.

All this being said, I do understand that you can do a lot with mcollective out of the box, without worrying about SimpleRPC etc.

Anonymous said...

@Grig,

OK, I see what you mean, gotta be said though it's not a DSL or custom language.

It's just Ruby with a few helpers, the only DSL like bit is the description files and they are 100% optional. But I agree getting to know the design principals behind even a native Ruby framework can be a challenge.

Bingeldac's buddy said...

This is funny.

Puppet can do both "pushes" and "pulls".

SSH to do remote administration fails for several reasons:

1) no audit trail
2) SSH keys are a giant security hole
3) SSH is tied to user management
4) Push based administration does not ensure a system is in a known state.

Steve said...

You're perfectly right in that deployments on large number of servers will require multi-threading but that's hardly rocket science. KwateeSDCM is a fairly simple and straightforward tool that does just that

pradeepp said...

Hi,
I'm new to deployment automation and in search of deployment automation tools including server provisioning. I gone through your post for Push vs Pull systems and found it is interesting.
Can you suggest such tools for Java and Java EE deployment automation.
I short listed a few tools. But an you suggest how Puppet is useful for mu case.

Thanks,
Pradeep. P

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...