Thursday, June 21, 2012

Using the Joyent Cloud API

Here's some notes I took while doing some initial experiments with provisioning machines in the Joyent Cloud. I used their CloudAPI directly, although in the future I also want to try the libcloud Joyent driver. The promise of the Joyent Cloud 'SmartMachines' is that they are really Solaris zones running on a SmartOS host, and that gives you more performance (especially I/O performance) than regular virtual machines such as the ones offered by most cloud vendors. I have yet to fully verify this performance increase, but it's next on my TODO list.

Installing the Joyent CloudAPI tools


I did the following on an Ubuntu 10.04 server:

  • installed node.js -- I downloaded it in tar.gz format from http://nodejs.org/dist/v0.6.19/node-v0.6.19.tar.gz then I ran the usual './configure; make; make install'
  • installed the Joyent smartdc node package by runing 'npm install smartdc -g'
  • created new ssh RSA keypair: id_rsa_joyentapi (private key) and id_rsa_joyentapi.pub (public key)
  • ran the sdc-setup utility, pointing it to the US-EAST-1 region:
# sdc-setup https://us-east-1.api.joyentcloud.com
Username (login): (root) myjoyentusername
Password:
The following keys exist in SmartDataCenter:
   [1] grig
Would you like to use an existing key? (yes) no
SSH public key: (/root/.ssh/id_rsa.pub) /root/.ssh/id_rsa_joyentapi.pub

If you set these environment variables, your life will be easier:
export SDC_CLI_URL=https://us-east-1.api.joyentcloud.com
export SDC_CLI_ACCOUNT=myjoyentusername
export SDC_CLI_KEY_ID=id_rsa_joyentapi
export SDC_CLI_IDENTITY=/root/.ssh/id_rsa_joyentapi


  • added recommended environment variables (above) to .bash_profile, sourced the file
Using the Joyent CloudAPI tools

At this point I was able to use the various 'sdc' commands included in the Joyent CloudAPI toolset. For example, to list the available Joyent datacenters, I used sdc-listdatacenters:

# sdc-listdatacenters
{
 "us-east-1": "https://us-east-1.api.joyentcloud.com",
 "us-west-1": "https://us-west-1.api.joyentcloud.com",
 "us-sw-1": "https://us-sw-1.api.joyentcloud.com"
}


To list the available operating system images available for provisioning, I used sdc-listdatasets (the following is just an excerpt of its output):

# sdc-listdatasets
[
 {
"id": "988c2f4e-4314-11e1-8dc3-2bc6d58f4be2",
"urn": "sdc:sdc:centos-5.7:1.2.1",
"name": "centos-5.7",
"os": "linux",
"type": "virtualmachine",
"description": "Centos 5.7 VM 1.2.1",
"default": false,
"requirements": {},
"version": "1.2.1",
"created": "2012-02-14T05:53:49+00:00"
 },
 {
"id": "e4cd7b9e-4330-11e1-81cf-3bb50a972bda",
"urn": "sdc:sdc:centos-6:1.0.1",
"name": "centos-6",
"os": "linux",
"type": "virtualmachine",
"description": "Centos 6 VM 1.0.1",
"default": false,
"requirements": {},
"version": "1.0.1",
"created": "2012-02-15T20:04:18+00:00"
 },
  {
"id": "a9380908-ea0e-11e0-aeee-4ba794c83c33",
"urn": "sdc:sdc:percona:1.0.7",
"name": "percona",
"os": "smartos",
"type": "smartmachine",
"description": "Percona SmartMachine",
"default": false,
"requirements": {},
"version": "1.0.7",
"created": "2012-02-13T19:24:17+00:00"
 },
etc

To list the available machine sizes available for provisioning, I used sdc-listpackages (again, this is just an excerpt of its output):

# sdc-listpackages
[
 {
"name": "Large 16GB",
"memory": 16384,
"disk": 491520,
"vcpus": 3,
"swap": 32768,Cloud Analytics API
"default": false
 },
 {
"name": "XL 32GB",
"memory": 32768,
"disk": 778240,
"vcpus": 4,
"swap": 65536,
"default": false
 },
 {
"name": "XXL 48GB",
"memory": 49152,
"disk": 1048576,
"vcpus": 8,
"swap": 98304,
"default": false
 },
 {
"name": "Small 1GB",
"memory": 1024,
"disk": 30720,
"vcpus": 1,
"swap": 2048,
"default": true
 },
etc

Provisioning and terminating machines

To provision a machine, you use sdc-createmachine and pass it the 'urn' field of the dataset (OS) you 
want, and the package name for the size you want. Example:

# sdc-createmachine --dataset sdc:sdc:percona:1.3.9 --package "Large 16GB"
{
 "id": "7ccc739e-c323-497a-88df-898dc358ea40",
 "name": "a0e7314",
 "type": "smartmachine",
 "state": "provisioning",
 "dataset": "sdc:sdc:percona:1.3.9",
 "ips": [
"A.B.C.D",
"X.Y.Z.W"
 ],
 "memory": 16384,
 "disk": 491520,
 "metadata": {
"credentials": {
  "root": "",
  "admin": "",
  "mysql": ""
}
 },
 "created": "2012-06-07T17:55:29+00:00",
 "updated": "2012-06-07T17:55:30+00:00"
}

The above command provisions a Joyent SmartMachine running the Percona distribution of MySQL in the 'large' size, with 16 GB RAM. Note that the output of the command contains the external IP of the provisioned machine (A.B.C.D) and also its internal IP (X.Y.Z.W). The output also contains the passwords for the root, admin and mysql accounts, in the metadata field.

Here's another example for provisioning a machine running Ubuntu 10.04 in the 'small' size (1 GB RAM). You can also specify a machine name when you provision it:

# sdc-createmachine --dataset sdc:sdc:ubuntu-10.04:1.0.1 --package "Small 1GB" --name ggtest
{
 "id": "dc856044-7895-4a52-bfee-35b404061920",
 "name": "ggtest",
 "type": "virtualmachine",
 "state": "provisioning",
 "dataset": "sdc:sdc:ubuntu-10.04:1.0.1",
 "ips": [
"A1.B1.C1.D1",
"X1.Y1.Z1.W1"
 ],
 "memory": 1024,
 "disk": 30720,
 "metadata": {
"root_authorized_keys": ""
 },
 "created": "2012-06-07T19:28:19+00:00",
 "updated": "2012-06-07T19:28:19+00:00"
}

For an Ubuntu machine, the 'metadata' field contains the list of authorized ssh keys (which I removed from my example above). Also, note that the Ubuntu machine is of type 'virtualmachine' (so a regular KVM virtual instance) as opposed to the Percona Smart Machine, which is of type 'smartmachine' and is actually a Solaris zone within a SmartOS physical host.

To list your provisioned machines, you use sdc-listmachines:

# sdc-listmachines
[
 {
"id": "36b50e4c-88d2-4588-a974-11195fac000b",
"name": "db01",
"type": "smartmachine",
"state": "running",
"dataset": "sdc:sdc:percona:1.3.9",
"ips": [
  "A.B.C.D",
  "X.Y.Z.W"
],
"memory": 16384,
"disk": 491520,
"metadata": {},
"created": "2012-06-04T18:03:18+00:00",
"updated": "2012-06-07T00:39:20+00:00"
 },

  {

    "id": "dc856044-7895-4a52-bfee-35b404061920",
    "name": "ggtest",
    "type": "virtualmachine",
    "state": "running",
    "dataset": "sdc:sdc:ubuntu-10.04:1.0.1",
    "ips": [
      "A1.B1.C1.D1",
      "X1.Y1.Z1.W1"
    ],
    "memory": 1024,
    "disk": 30720,
    "metadata": {
      "root_authorized_keys": ""
    },
    "created": "2012-06-07T19:30:29+00:00",
    "updated": "2012-06-07T19:30:38+00:00"
  },

]

Note that immediately after provisioning a machine, its state (as indicated by the 'state' field in the output of sdc-listmachines) will be 'provisioning'. The state will change to 'running' once the provisioning process is done. At that point you should be able to ssh into the machine using the private key you created when installing the CloudAPI tools.

To terminate a machine, you first need to stop it via sdc-stopmachine, then to delete it via sdc-deletemachine. Both of these tools take the id of the machine as a parameter. If you try to delete a machine without first stoppping it, or without waiting sufficient time for the machine to go into the 'stopped' state, you will get a message similar to Requested transition is not acceptable due to current resource state.

Bootstrapping a machine with user data

In my opinion, a cloud API for provisioning instances/machines is only useful if it offers a bootstrapping mechanism for running user-specified scripts upon the first run. This would enable an integration with configuration management tools such as Chef or Puppet. Fortunately, the Joyent CloudAPI does support this bootstrapping via its Metadata API.
For a quick example of a customized bootstrapping action, I changed the hostname of an Ubuntu machine and also added it to /etc/hostname. This is a toy example. In a real-life situation, you would instead download a script from one of your servers and run it in order to install whatever initial packages you need, then to configure the machine as a Chef or Puppet client, etc. In any case, you need to actually spell out the commands you need the machine to run during its initial provisioning boot process. You do that by defining the metadata 'user-script' variable:

# sdc-createmachine --dataset sdc:sdc:ubuntu-10.04:1.0.1 --package "Small 1GB" --name ggtest2 --metadata user-script='hostname ggtest2; echo ggtest2 > /etc/hostname'
{
 "id": "379c0cad-35bf-462a-b680-fc091c74061f",
 "name": "ggtest2",
 "type": "virtualmachine",
 "state": "provisioning",
 "dataset": "sdc:sdc:ubuntu-10.04:1.0.1",
 "ips": [
"A2.B2.C2.D2",
"X2.Y2.Z2.W2"
 ],
 "memory": 1024,
 "disk": 30720,
 "metadata": {
"user-script": "hostname ggtest2; echo ggtest2 > /etc/hostname",
"root_authorized_keys": ""
 },
 "created": "2012-06-08T23:17:44+00:00",
 "updated": "2012-06-08T23:17:44+00:00"
}

Note that the metadata field now contains the user-script variable that I specified.

Collecting performance metrics with Joyent Cloud Analytics

The Joyent Cloud Analytics API lets you define metrics that you want to query for on your machines in the Joyent cloud. Those metrics are also graphed on the Web UI dashboard as you define them, which is a nice touch. For now there aren't that many such metrics available, but I hope their number will increase.

Joyent uses a specific nomenclature for the Analytics API. Here are some definitions, verbatim from their documentation (CA means Cloud Analytics):

metric is any quantity that can be instrumented using CA. For examples:

  • Disk I/O operations
  • Kernel thread executions
  • TCP connections established
  • MySQL queries
  • HTTP server operations
  • System load average


When you want to actually gather data for a metric, you create an instrumentation. The instrumentation specifies:
  • which metric to collect
  • an optional predicate based on the metric's fields (e.g., only collect data from certain hosts, or data for certain operations)
  • an optional decomposition based on the metric's fields (e.g., break down the results by server hostname)
  • how frequently to aggregate data (e.g., every second, every hour, etc.)
  • how much data to keep (e.g., 10 minutes' worth, 6 months' worth, etc.)
  • other configuration options
To get started with this API, you need to first see what analytics/metrics are available. You do that by calling sdc-describeanalytics (what follows is just a fragment of the output):

# sdc-describeanalytics
 "metrics": [
{
  "module": "cpu",
  "stat": "thread_samples",
  "label": "thread samples",
  "interval": "interval",
  "fields": [
    "zonename",
    "pid",
    "execname",
    "psargs",
    "ppid",
    "pexecname",
    "ppsargs",
    "subsecond"
  ],
  "unit": "samples"
},
{
  "module": "cpu",
  "stat": "thread_executions",
  "label": "thread executions",
  "interval": "interval",
  "fields": [
    "zonename",
    "pid",
    "execname",
    "psargs",
    "ppid",
    "pexecname",
    "ppsargs",
    "leavereason",
    "runtime",
    "subsecond"
  ],
etc

You can create instrumentations either via the Web UI (go to the Analytics tab) or via the command line API.

Here's an example of creating an instrumentation for file system logical operations via the sdc-createinstrumentation API:


# sdc-createinstrumentation -m fs -s logical_ops

{
  "module": "fs",
  "stat": "logical_ops",
  "predicate": {},
  "decomposition": [],
  "value-dimension": 1,
  "value-arity": "scalar",
  "enabled": true,
  "retention-time": 600,
  "idle-max": 3600,
  "transformations": {},
  "nsources": 0,
  "granularity": 1,
  "persist-data": false,
  "crtime": 1340228876662,
  "value-scope": "interval",
  "id": "17",
  "uris": [
    {
      "uri": "/myjoyentusername/analytics/instrumentations/17/value/raw",
      "name": "value_raw"
    }
  ]
}

To list the instrumentations you have created so far, you use sdc-listinstrumentations:


# sdc-listinstrumentations

[
  {
    "module": "fs",
    "stat": "logical_ops",
    "predicate": {},
    "decomposition": [],
    "value-dimension": 1,
    "value-arity": "scalar",
    "enabled": true,
    "retention-time": 600,
    "idle-max": 3600,
    "transformations": {},
    "nsources": 2,/
    "granularity": 1,
    "persist-data": false,
    "crtime": 1340228876662,
    "value-scope": "interval",
    "id": "17",
    "uris": [
      {
        "uri": "/myjoyentusername/analytics/instrumentations/17/value/raw",
        "name": "value_raw"
      }
    ]
  }
]

To retrieve the actual metrics captured by a given instrumentation, call sdc-getinstrumentation and pass it the instrumentation id:

# sdc-getinstrumentation -v 17 { "value": 1248, "transformations": {}, "start_time": 1340229361, "duration": 1, "end_time": 1340229362, "nsources": 2, "minreporting": 2, "requested_start_time": 1340229361, "requested_duration": 1, "requested_end_time": 1340229362 } 

You can see how this can be easily integrated with some like Graphite in order to keep historical information about these metrics.

You can dig deeper into a specific metric by decomposing it by different fields, such as the application name. For example, to see filesystem logical operation by application name, you would call:


# sdc-createinstrumentation -m fs -s logical_ops --decomposition execname

{
  "module": "fs",
  "stat": "logical_ops",
  "predicate": {},
  "decomposition": [
    "execname"
  ],
  "value-dimension": 2,
  "value-arity": "discrete-decomposition",
  "enabled": true,
  "retention-time": 600,
  "idle-max": 3600,
  "transformations": {},
  "nsources": 0,
  "granularity": 1,
  "persist-data": false,
  "crtime": 1340231734049,
  "value-scope": "interval",
  "id": "18",
  "uris": [
    {
      "uri": "/myjoyentusername/analytics/instrumentations/18/value/raw",
      "name": "value_raw"
    }
  ]
}

Now if you retrieve the value for this instrumentation, you see several values in the output, one value for application that performs file system logical operations:



# sdc-getinstrumentation -v 18
{
  "value": {
    "grep": 4,
    "ksh93": 5,
    "cron": 7,
    "gawk": 15,
    "svc.startd": 2,
    "mysqld": 163,
    "nscd": 27,
    "top": 159
  },
  "transformations": {},
  "start_time": 1340231762,
  "duration": 1,
  "end_time": 1340231763,
  "nsources": 2,
  "minreporting": 2,
  "requested_start_time": 1340231762,
  "requested_duration": 1,
  "requested_end_time": 1340231763
}

Another useful technique is to isolate metrics pertaining to a specific host (or 'zonename' in Joyent parlance). For this, you need to specify a predicate that will filter only the host with a specific id (you can see the id of a host when you call sdc-listmachines). Here's an example that captures the CPU wait time for a Percona SmartMachine which I provisioned earlier:


# sdc-createinstrumentation -m cpu -s waittime -p '{"eq": ["zonename","36b50e4c-88d2-4588-a974-11195fac000b"]}'

{
  "module": "cpu",
  "stat": "waittime",
  "predicate": {
    "eq": [
      "zonename",
      "36b50e4c-88d2-4588-a974-11195fac000b"
    ]
  },
  "decomposition": [],
  "value-dimension": 1,
  "value-arity": "scalar",
  "enabled": true,
  "retention-time": 600,
  "idle-max": 3600,
  "transformations": {},
  "nsources": 0,
  "granularity": 1,
  "persist-data": false,
  "crtime": 1340232271092,
  "value-scope": "interval",
  "id": "19",
  "uris": [
    {
      "uri": "/myjoyentusername/analytics/instrumentations/19/value/raw",
      "name": "value_raw"
    }
  ]
}

You can combine decomposition with predicates. For example, here's how to create an instrumentation for CPU usage time decomposed by CPU mode (user, kernel):

# sdc-createinstrumentation -m cpu -s usage -n cpumode -p '{"eq": ["zonename","36b50e4c-88d2-4588-a974-11195fac000b"]}' { "module": "cpu", "stat": "usage", "predicate": { "eq": [ "zonename", "36b50e4c-88d2-4588-a974-11195fac000b" ] }, "decomposition": [ "cpumode" ], "value-dimension": 2, "value-arity": "discrete-decomposition", "enabled": true, "retention-time": 600, "idle-max": 3600, "transformations": {}, "nsources": 0, "granularity": 1, "persist-data": false, "crtime": 1340232361944, "value-scope": "point", "id": "20", "uris": [ { "uri": "/myjoyentusername/analytics/instrumentations/20/value/raw", "name": "value_raw" } ] }
Now when you retrieve the values for this instrumentation, you can see them separated by CPU mode:

# sdc-getinstrumentation -v 20 { "value": { "kernel": 24, "user": 28 }, "transformations": {}, "start_time": 1340232390, "duration": 1, "end_time": 1340232391, "nsources": 2, "minreporting": 2, "requested_start_time": 1340232390, "requested_duration": 1, "requested_end_time": 1340232391 }


Finally, here's a MySQL-specific instrumentation that you can create on a machine running MySQL, such as a Percona SmartMachine. This one is for capturing MySQL queries:

# sdc-createinstrumentation -m mysql -s queries -p '{"eq": ["zonename","36b50e4c-88d2-4588-a974-11195fac000b"]}' { "module": "mysql", "stat": "queries", "predicate": { "eq": [ "zonename", "36b50e4c-88d2-4588-a974-11195fac000b" ] }, "decomposition": [], "value-dimension": 1, "value-arity": "scalar", "enabled": true, "retention-time": 600, "idle-max": 3600, "transformations": {}, "nsources": 0, "granularity": 1, "persist-data": false, "crtime": 1340232562361, "value-scope": "interval", "id": "22", "uris": [ { "uri": "/myjoyentusername/analytics/instrumentations/22/value/raw", "name": "value_raw" } ] }
Overall, I found the Joyent Cloud API and its associated Analytics API fairly easy to use, once I got past some nomenclature quirks. I also want to mention that the support I got from Joyent was very, very good. Replies to questions regarding some of the topics I discussed here were given promptly and knowledgeably. My next step is gauging the performance of MySQL on a SmartMachine, when compared to a similar-sized instance running in the Amazon EC2 cloud. Stay tuned.

    4 comments:

    John Esposito said...

    Hi Grig,

    I found your blog today while searching for writers who cover performance-related topics. As a community curator at DZone.com, I write and look for content that is interesting to a readership of advanced developers, and I think they'd be interested in checking out some of your blog posts.

    If you're interested, I'd like to let you know about our MVB (most valuable blogger) program, which is now at 600 strong. Here are the details about that: www.dzone.com/aboutmvb With the quality of writing displayed in your blog, I'd be honored to invite you into our program. You get a shiny web-badge :) Let me know what you think!

    Thanks, and take care,

    John Esposito
    Content Curator, DZone
    jesposito@dzone.com

    John Esposito said...

    Hi Grig,

    I found your blog today while searching for writers who cover performance-related topics. As a community curator at DZone.com, I write and look for content that is interesting to a readership of advanced developers, and I think they'd be interested in checking out some of your blog posts.

    If you're interested, I'd like to let you know about our MVB (most valuable blogger) program, which is now at 600 strong. Here are the details about that: www.dzone.com/aboutmvb With the quality of writing displayed in your blog, I'd be honored to invite you into our program. You get a shiny web-badge :) Let me know what you think!

    Thanks, and take care,

    John Esposito
    Content Curator, DZone
    jesposito@dzone.com

    Grig Gheorghiu said...

    Hi John -- thanks, I am aware of your MVB program but I am not ready for it yet. Thanks for getting in touch in any case. If you want to publish this particular blog post on DZone, please feel free.

    Grig

    Anonymous said...

    So you are using the SmartOS-based appliance for Percona? That seems smart, since SmartOS is essentially running on bare-metal.

    Modifying EC2 security groups via AWS Lambda functions

    One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...