Friday, April 18, 2014

Grabbing the statement durations out of a log file using gnu grep

Sometimes I find myself wanting to grep all the query durations out of a file for sorting.  I recently discovered that I can use gnu grep's lookbehind syntax like so:

grep -oP '(?<=duration: )[0-9\.]+' postgresql-Mon.csv

and then sort them numerically like so:

grep -oP '(?<=duration: )[0-9\.]+' postgresql-Mon.csv | sort -n | tail

Thursday, November 21, 2013

Ubuntu 3.11.0 kernel not as stable as hoped

As I mentioned in my previous post, Ubuntu recently released a 3.11.0 kernel for Precise.  Unfortunately, while it appeared stable during a long burn-in process with various pgbench workloads and scale sizes, it crashed shortly after putting the machine into service, so make sure you test thoroughly on your hardware. 

This particular hardware was the Dell Poweredge 720 that rackspace uses.

Guess we're going back to custom compiling kernels for a little while longer.

Anyone else have experiences good or bad with it?

I believe this is the same kernel that's shipping with 13.10 Saucy Salamander.

Friday, November 15, 2013

Ubuntu has released a 3.11.0 kernel for 12.04 LTS!

This one is for those of you compiling a custom kernel or staying on 10.04 LTS because of this issue on database servers in the 3.x kernels: https://lkml.org/lkml/2012/10/9/210

Ubuntu has released linux-image-3.11.0-13-generic for 12.04 in which that fix is included.  It's actually been included since 3.9rc2, but Ubuntu skipped straight to 3.11.

And there was much rejoicing in the PostgreSQL community.  At least from those of us who manage 12.04 LTS PostgreSQL servers!

For those of you who don't recall, here's a link to the big discussion thread:

http://www.postgresql.org/message-id/60B572D9298D944580F7D51195DD30804357FA4ABF@VMBX125.ihostexchange.net

One item of note on 3.11 is that the kernel.sched_migration_cost setting is now called kernel.sched_migration_cost_ns, so if you are setting that via sysctl, you will need to update your sysctl.conf or sysctl.d file after upgrade.

Now go upgrade your test servers and let us know how it performs!  I've been burning in a new server for about 24 hours with pgbench and the 3.11.0-13-generic kernel and so far it's been stable and a good performer.

Saturday, August 10, 2013

Get your PgBouncer pools stats into Ganglia

Just whipped up a quick ruby script to put PgBouncer "show pools" data into ganglia using gmetric.  Based it heavily off the postgres_gmetric script.

You can find the pgbouncer gmetric script here: https://github.com/pgexperts/pgbouncer_gmetric

Wednesday, February 27, 2013

Script to add the apt.postgresql.org respository on Debian/Ubuntu

You've probably heard the announcement: Martin Pitt's PPA is being deprecated.  This is in an effort to push folks towards the new official apt.postgresql.org.

See the announcement here: http://www.postgresql.org/about/news/1432/

Also, see the deprecation notice on Martin's PPA:
https://launchpad.net/~pitti/+archive/postgresql

Since we have quite a few customers using Martin's PPA for PostgreSQL packages under Ubuntu, I wrote a little shell script to switch repositories if you're using Martin's PPA and just install the apt.postgresql.org repository if you are not.

You can find it here: https://github.com/pgexperts/add-pgdg-apt-repo

Thursday, February 14, 2013

Finding unreplicated tables and sequences in slony with check_postgres and nagios

Since slony doesn't automatically add tables to replication for you, sometimes it's handy to double check that everything is being replicated.  If your slony cluster is named "sample" then you would have a slony schema named "_sample". As such, you can query against the sl_table and sl_sequence tables to find unreplicated tables like so:

SELECT pg_class.OID::regclass as table_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'r' AND pg_class.OID NOT IN (SELECT tab_reloid FROM _sample.sl_table) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample')

SELECT pg_class.OID::regclass as sequence_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'S' AND pg_class.OID NOT IN (SELECT seq_reloid FROM _sample.sl_sequence) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample');

 Those queries are a bit unpleasant to remember, so I like to make them into views like so:


CREATE OR REPLACE VIEW public.unreplicated_tables AS SELECT pg_class.OID::regclass as table_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'r' AND pg_class.OID NOT IN (SELECT tab_reloid FROM _sample.sl_table) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample') AND pg_class.relname <> 'fwd_epc_map';

CREATE OR REPLACE VIEW public.unreplicated_sequences AS SELECT pg_class.OID::regclass as sequence_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'S' AND pg_class.OID NOT IN (SELECT seq_reloid FROM _sample.sl_sequence) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample');

Now, that's a little handier because we can just select straight out of the view.  Now that we have these easy to use views, we can use check_postgres.pl's custom_query check to setup a nagios alert whenever there are unreplicated tables or sequences.

First we need to create the check commands like so:

define command{
        command_name    check_pgsql_slony_unreplicated_tables
        command_line    /usr/lib64/nagios/plugins/check_postgres.pl --host $HOSTADDRESS$ --port=5432 --dbuser nagios --dbname mydb --action=custom_query --query="SELECT count(table_name) AS result FROM public.unreplicated_tables" --valtype=integer --warn=1 --critical=5
}

define command{
        command_name    check_pgsql_slony_unreplicated_sequences
        command_line    /usr/lib64/nagios/plugins/check_postgres.pl --host $HOSTADDRESS$ --port=5432 --dbuser nagios --dbname mydb --action=custom_query --query="SELECT count(sequence_name) AS result FROM public.unreplicated_sequences" --valtype=integer --warn=1 --critical=5
}
I apologize for all the wrapping there.

Now that we have the commands defined, we use them in a service definition:

define service {
    use                          postgres-service
    notifications_enabled        1
    hostgroup_name               slony_primary
    service_description          Slony Unreplicated Tables
    check_command                check_pgsql_slony_unreplicated_tables
    contact_groups               pgexperts
}

define service {
    use                          postgres-service
    notifications_enabled        1
    hostgroup_name               slony_primary
    service_description          Slony Unreplicated Sequences
    check_command                check_pgsql_slony_unreplicated_sequences
    contact_groups               pgexperts
}
And that's it! Now we'll receive an alert whenever an unreplicated table or sequence exists on the servers in hostgroup slony_primary.



Tuesday, February 12, 2013

Russian Doll Syndrome / Please Monitor Your RAID Volumes

As part of setting up nagios monitoring for customers, I always make sure that the RAID volumes are being monitored on the DB servers.  More often than not, I have to tell the customer there are bad disks in one or more of the RAID volumes.  Don't be one of those customers, make sure you monitor the RAID volumes on your PostgreSQL (and other) servers!

So, today I was setting up some of this same monitoring for LSI RAID controllers in a Dell server using the check_sasraid_megaraid nagios plugin.   This plugin requires the LSI SNMP installer as well as net-snmp and a few other miscellaneous packages.

This brings us to the Russian Doll Syndrome.  I download the zip file from the LSI support site and unzip it to find a zip full of zips:

$ unzip 12.10.03.01_MegaRAID_SNMP_Installers.zip
Archive:  12.10.03.01_MegaRAID_SNMP_Installers.zip
 extracting: SAS_SNMP_Win_Installer.zip 
 extracting: SAS_SNMP_Linux_Installer.zip 
 extracting: SAS_SNMP_Linux_x64_Installer.zip 
 extracting: SAS_SNMP_Solaris_Installer.zip 
 extracting: SAS_SNMP_Solaris11x86_Installer.zip 
  inflating: 12.10.03.01_MegaRAID_SNMP_Installers.txt 

Ok, fine, they have a zip for each architecture, so let's unzip the Linux_x64 one and get on with this:

$ unzip SAS_SNMP_Linux_x64_Installer.zip
Archive:  SAS_SNMP_Linux_x64_Installer.zip
  inflating: MD5CheckSum.txt        
 extracting: SAS_SNMP_Linux_x64_Installer-12.10-0301.zip

Oh, sweet, another zip file!

$ unzip SAS_SNMP_Linux_x64_Installer-12.10-0301.zip
Archive:  SAS_SNMP_Linux_x64_Installer-12.10-0301.zip
  inflating: readme.txt             
 extracting: sas_snmp_64bit.tar.gz  

And inside that, we find a tarball:

$ tar xvfz sas_snmp_64bit.tar.gz
sas_snmp-12.10-0301.x86_64.rpm

And finally an RPM.  Yeesh LSI, why couldn't we just have the RPM right inside the SAS_SNMP_Linux_x64_Installer.zip file or even just in the top level zip file?

So, all kidding aside, please make sure you're monitoring the RAID volumes on your PostgreSQL servers!