Monday, October 10, 2005

Mini HOWTO #2: system monitoring via SNMP

Goal: We want to monitor system resources such as CPU utilization, memory utilization, disk space, processes, system load via SNMP

Solution: Install and configure Net-SNMP

1. Install Net-SNMP
  • if installing from source, the configuration file snmpd.conf will go into /usr/local/share/snmp
  • by default there is no configuration file; it can be generated via the snmpconf Perl utility
2. Configure Net-SNMP by editing /usr/local/share/snmp/snmp.conf

2a. Keep things simple with access control; the following entries can be defined (as opposed to more complicated com2sec, group etc.):

# rwuser: a SNMPv3 read-write user
# arguments: user [noauth|auth|priv] [restriction_oid]
rwuser topsecretv3

# rouser: a SNMPv3 read-only user
# arguments: user [noauth|auth|priv] [restriction_oid]
rouser topsecretv3_ro

# rocommunity: a SNMPv1/SNMPv2c read-only access community name
# arguments: community [default|hostname|network/bits] [oid]
rocommunity topsecret_ro

# rwcommunity: a SNMPv1/SNMPv2c read-write access community name
# arguments: community [default|hostname|network/bits] [oid]
rwcommunity topsecret


2b. Disk space can be monitored by adding entries to the 'disk' section. Example:

disk /
disk /boot
disk /usr

2c. Processes can be monitored by adding entries to the 'proc' section. Example:

proc java
proc postmaster
proc mysqld

2d. System load can be monitored by adding entries to the 'load' section. Example:

load 5 5 5

2e. The EXAMPLE.conf file in the source directory shows more capabilities of the SNMP agent (you can run executables/scripts and return one line of output and an exit code)

3. Start up the SNMP daemon (agent) by running /usr/local/sbin/snmpd. If you want snmpd to start up automatically at boot time, add the line '/usr/local/sbin/snmpd' to /etc/rc.d/rc.local on Red Hat systems, or equivalent on other flavors of Unix

3a. The agent logs to /var/log/snmpd.log (for more detailed debugging info, start the agent with the -D flag)

4. On the SNMP monitoring host, use snmpget to query the SNMP agent running on the target host. The trick here is to know which OIDs to use when you query the agent.

Examples:

Get available disk space for / on the target host:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.9.1.7.1

(this will return available disk space for the first entry in the 'disk' section of snmpd.conf; replace 1 with n for the nth entry)

Get the number of java processes running on the target host:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.2.1.5.1

(replace 1 at the end with n for the nth entry in the 'proc' section)

Get the 1-minute system load on the target host:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.10.1.3.1

Get the 5-minute system load on the target host:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.10.1.3.2


Get the 15-minute system load on the target host:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.10.1.3.3


Get various CPU utilization metrics on the target host via snmpwalk:

snmpwalk -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.11

Sample output:

UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 1
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 5
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 5
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 8
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 1007102
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 3879
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 544737
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 238396576


To retrieve a specific metric, for example the number of interrupts, you would do:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.11.7.0

(we append 7.0 to the OID that we used in snmpwalk, because ssSysInterrupts is the 7th variable in the snmpwalk output)


Get various memory utilization metrics on the target host via snmpwalk:

snmpwalk -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.4

Sample output:

UCD-SNMP-MIB::memIndex.0 = INTEGER: 0
UCD-SNMP-MIB::memErrorName.0 = STRING: swap
UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 2048276
UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 2005604
UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 998560
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 89896
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 2095500
UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000
UCD-SNMP-MIB::memShared.0 = INTEGER: 0
UCD-SNMP-MIB::memBuffer.0 = INTEGER: 234884
UCD-SNMP-MIB::memCached.0 = INTEGER: 459016
UCD-SNMP-MIB::memSwapError.0 = INTEGER: 0
UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:


To retrieve a specific metric, for example the amount of available swap space, you would do:

snmpget -v 1 -c "community" target_name_or_ip .1.3.6.1.4.1.2021.4.4.0

(we append 4.0 to the OID that we used in snmpwalk, because memAvailSwap is the 4th variable in the snmpwalk output)

Note: for CPU and memory stats, you don't need to add any special directives in the snmpd.conf configuration file

6 comments:

Anonymous said...

Great article, it's very hard to find this information anywhere. Are there any howto's on getting this same information sent out as SNMP traps (e.g. when CPU% or disk space gets above a certain level)?

Unknown said...

I would like to know how you can monitor the following process ...

perl -w /opt/aws/platform/admindaemon/script/awsconfclientd id

As we know that the Operating System recognises it, just as perl without arguments...

Any idea to solve this issue ??

My email is iuzcat@cantv.com.ve
Thanks and Regards.

Grig Gheorghiu said...

Would it work if you called the perl command line inside a bash script and monitored the name of that script?

If not, I'd use a different system for monitoring processes -- you could ssh into the remote system via ssh with public keys, then do a ps and grep for the exact process name.

Grig

Javier said...

Great work, but i can't find /usr/local/share/snmp/snmp.conf, I'm using Debian, i try to find it with locate, but no luck.
SNMP is working, 'cause i use snmpwalk from another machine and i got a answer.
And where did you find, what MIB query to use, is system dependant?

PS: Sorry for my bad english, it is not my mother tongue.

Anonymous said...

snmp with java coding?

creature said...

hey iv been trying to get snmp working for a while now... iv given up on trying to get the host to tell me when things are wrong and gone for the ask everything that possibly could have gone wrong in case something has approach -_-

I tried you setup for proc and I cant seem to get it working properly... I only seem to be able to get proc 1 and proc 2 checked but I get

Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
Failed object: UCD-SNMP-MIB::prMin.3

for proc 3... any ideas what I'm doing wrong???

bellow is my proc list..

proc sshd
proc apache2
proc mysqld

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...