Typically we see a pair of load balancers in HA mode deployed up front and handling user traffic, or more commonly serving as the origin for CDN traffic. In order to avoid deploying many pairs of load balancers in between the front-end app server layer and various services layers, or in between one service layer and another, one design pattern I've successfully used is an haproxy instance running locally (on 127.0.0.1) on each node that needs to talk to N other nodes running some type of service. This approach has several advantages:
- No need to use up nodes, be they bare-metal servers, cloud instances or VMs, for the sole purpose of running yet another haproxy instance (and you actually need 2 nodes for an HA configuration, plus you need to run keepalive or something similar on each node)
- Potentially fewer bottlenecks, as each node fans out to all other services it needs to talk to, with no need to go first through a centralized load balancer
- Easy deployment via Chef or Puppet, by simply adding the installation of the haproxy instance to the app node's cookbook/manifest
The main disadvantage of this solution is an increased number of health checks against the service nodes behind each haproxy (1 health check from each app node). Also, as @lusis pointed out, in some scenarios, for example when the local haproxy instances talk to a Riak cluster, there is the possibility of each app node seeing a different image of the cluster in terms of the particular Riak node(s) it gets the data from (but I think with Riak this is the case even with a centralized load balancer approach).
In any case, I recommend this approach which has worked really well for us here at NastyGal. I used a similar approach in the past as well at Evite.
Thanks to @bdha for spurring an interesting Twitter thread about this approach, and to @lusis and @obfuscurity for jumping into the discussion with their experiences. As @garethr said, somebody needs to start documenting these patterns!