Posted on November 24, 2007
After using PL/Proxy in production for a bit and working on some scaling and partitioning projects, I’ve come up with a few observations:
- The PL/PgSQL based configuration is flexible enough to not be considered a hack, but the lack of real in depth documentation leaves one to discover why on their own.
- If you have an environment, like mine, where you’re constantly expanding, building out new infrastructure and have the potential to add new pgBouncer servers on your DB tier for PL/Proxy, Do not use the CONNECT methodology. Instead invest the time to write your configuration functions and use the CLUSTER, RUN ON method instead.
- Plan for more time in debugging PL/PgSQL functions. While generally it’s just replacing the body and language in your functions, it does take more organization and testing to make sure your new PL/PgSQL functions and PL/Proxy counterparts work flawlessly in production.
- Single database machine development environments shouldn’t prevent you from testing PL/Proxy functionality. Setup pgBouncer and various databases to mimic your production environment. We’ve skipped this step in development and paid the price during rollout.
- Even with Marko’s suggestions we’ve yet to find a good way to handle the client_encoding issue. Unfortunately what this means is as we scale, we’re still stuck in SQL_ASCII land and the job to convert to UTF-8 just gets bigger and bigger. Note my dirty hack of using a wrapper function kicks out log messages that normal PLProxy based functions do not, which clutter up our Postgres logs.
- It’s solid. PL/Proxy is still everything I thought it was when I rolled it out and I’ve been very happy with the reliability and scaling it buys me.
I’m looking forward to Marko’s next release of PLProxy which I'm guessing will hit sometime in December. If I read his notes right, it may address my client_encoding issue.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on October 06, 2007
This evening I rolled out our first use of plProxy tonight to facilitate cross PostgreSQL server communication. It was a bumpy start going from development to production so I thought I would detail a few problems I ran into:
- The version of flex on RHEL 4.0 (don’t ask - it’s related to vendor support issues with our SAN) doesn't like the plProxy install process. scanner.l doesn't parse properly and thus scanner.c is never made. I got around this by running it on a more current Gentoo based system. I then tested the compiled version of plproxy on our failover/standby PostgreSQL server.
Our older primary production database is still SQL_ASCII due to the amount of work involved migrating it to UTF-8, which is one of our mid-term projects. plProxy will not communicate between databases if the encodings do not match. I wrote a wrapper function that sets client_encoding to SQL_ASCII on our UTF-8 database sessions for the function that is being proxied. Here's an example of what I ended up doing:
CREATE OR REPLACE FUNCTION myFunction(IN uid1 int, IN uid2 int) RETURNS bool AS $$
CONNECT 'host=pgbouncer01 dbname=production user=www port=6000';
$$ LANGUAGE plproxy;
CREATE OR REPLACE FUNCTION myFunctionWithEncoding(IN uid1 int, IN uid2 int) RETURNS bool AS $$
DECLARE
result RECORD;
BEGIN
SET client_encoding='SQL_ASCII';
SELECT INTO result * FROM myFunction(uid1, uid2);
SET client_encoding='UTF-8';
RETURN result.boolcol;
END;
$$ LANGUAGE plpgsql;
This is a pretty dirty hack, but it gets the job done. In addition, I probably didn’t need to set the encoding back to UTF-8 but I thought it better to be safe. I'll be happy once I get our production database set to UTF-8. The main gotcha I found in plProxy is this encoding issue. Something that will be resolved for us in the long run, but will have to be a hack in the short run.
Per Skype’s setup, we use pgBouncer between the servers for plProxy and cut down on the overhead required for startups. One of the nice things about this is since the pgBouncer instance is only used for inter-PostgreSQL server communication, I’ve been able to determine that we’re doing approximately 100 executions per second. While this is pretty low compared to our PostgreSQL usage in other places, it gives me a good feeling about our strategy with plProxy. After closely watching machine stats for the first 30 minutes of production, it appears that plProxy has very little if any impact on machine resources in our infrastructure.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on October 01, 2007
In my previous blog post about Parallel Reindexing, Robert Treat asked about locking issues, and as my follow-up comment stated, the script wasn't engineered with locking in mind. It was built for offline reindexing. Part of the issue with Reindexing is while you can create an index concurrent with normal database operation in PostgreSQL 8.2, you can not reindex concurrently. This feature is currently on the PostgreSQL Todo List.
As I thought about the problem, it occurred to me that on the surface this was a fairly trivial problem to solve. Get the schema for the indexes of a table, create new indexes currently with a temporary name, drop the old index and rename the new index. There might be some locking issues with ALTER INDEX ... RENAME TO, but overall it seemed like the easiest and best strategy to go with. That is, until I started prototyping. Then it occurred to me - Primary Keys. Primary keys differ from normal indexes in that they are also constraints on the table. This is also true of unique indexes. By adding constraints to the mix, and having to remove and re-add primary keys, the equation just became more complex.
Take the given pg_constraint table from the pg_catalog schema:
Table "pg_catalog.pg_constraint"
Column | Type | Modifiers
---------------+------------+-----------
conname | name | not null
connamespace | oid | not null
contype | "char" | not null
condeferrable | boolean | not null
condeferred | boolean | not null
conrelid | oid | not null
contypid | oid | not null
confrelid | oid | not null
confupdtype | "char" | not null
confdeltype | "char" | not null
confmatchtype | "char" | not null
conkey | smallint[] |
confkey | smallint[] |
conbin | text |
consrc | text |
Indexes:
"pg_constraint_oid_index" UNIQUE, btree (oid)
"pg_constraint_conname_nsp_index" btree (conname, connamespace)
"pg_constraint_conrelid_index" btree (conrelid)
"pg_constraint_contypid_index" btree (contypid)
This would appear to be the primary source of internal data organization for table related constraints, and while it most likely is, merely removing a row from this table and inserting a new one for the new index doesn't give us the ability to remove a primary key and make a newly constructed index the new primary key for a table.
This is where I've explored to with regard to the index related constraints. As I explore further in the pg_catalog schema and how an index is stored as a primary key, I will detail it further here.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on September 24, 2007
One of the issues we’ve had to deal with at myYearbook is how to deal with reducing index bloat in PostgreSQL. The reindex database command does this pretty handily, but since it is performed table by table in serial, it can be pretty slow when you're sporting 30GB+ tables. To solve this I threw together this little CLI php script.
The idea is pretty straightforward - get a list of all the tables with indexes from the PostgreSQL system catalog for a given database, split them up into chunks and run multiple reindexes at the same time. If you find it useful or have ideas to improve it, drop me a note and let me know.
Here is an example of usage:
[gmr@gmr-imac ~]$ ./parallelReindex.php
Error: Required parameters not set.
Usage: parallelReindex.php parameters
Parameters:
-host Specify the database host to connect to - required
-port Specify the port to connect to
-dbname Specify the database name to connect to - required
-user Specify the database user to connect as - required
-password Specify the database user password
-threads Set the number of parallel reindex tasks to run (default 15)
Example:
./parallelReindex.php -host localhost -user postgres -dbname test
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on September 04, 2007
Recently there has been a very active discussion on the pgsql-advocacy mailing list about renaming PostgreSQL to Postgres. If you have a vested interest or an opinion either way, please drop an email and let the community know. For a little background, the project was originally named Postgres. In 1995 it was renamed to Postgres95 and then in 1996 it was once again renamed, this time to PostgreSQL. The discussion is about returning to the more pronounceable original name. If you're interested, the PostgreSQL site has more information on the history of the project here.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on August 31, 2007
Recently I submitted a patch to PostgreSQL that will allow pg_dump and pg_dumpall to ignore tablespace data. This is useful if you use tablespaces on one machine and want to restore to another with a different tablespace setup. Since this patch won't likely make it into PostgreSQL until 8.4, I thought I would make it available here for those that could use it.
This should commit cleanly to 8.2.4, and I'll try to keep the patch up to date through 8.3. As of right now, I believe I am the only one who has tested it, so keep that in mind. That being said, I'd use it in production. If you find it useful, please let me know, and likewise if you find any issues with it.
pg_dump-8.2.4-no-tablespace-option.patch.txt
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on July 08, 2006
I flew up yesterday for a the PostgreSQL conference which is meeting at Ryerson University. My plane got in late but in time to have dinner with Gavin Sherry, Josh Berkus, and Greg Stark. Then, even though it was only about 8 or 9 my time, I was able to get to sleep in order to get as much sleep as possible.
Thus far today I have attended Bruce Momjain's keynote, which was a good oveview of the initial history of PostgreSQL, a talk on Bizgres and what they're doing with PostgreSQL to enhance performance, a Performance round table moderated by Simon Riggs and then an Intro to PostgreSQL Hacking. I'm currently in the PostgreSQL Advocacy session.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on July 08, 2006
I had no idea how big PostgreSQL is in Japan. I'm in the PostgreSQL Fundraising and Advocacy meeting right now and JPUG is pdoing a presentation. With ~3,000 members and 8 branches, JPUG is very well organized and impressive. They distributed one of their newsletters to us which was very high quality. While I'm having a bit of a hard time understanding the speaker, I cant help be be amazed at how well these guys are organized and the thriving community over there.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on December 12, 2005
Jim Nasby sent an email to the pgsql-advocacy list pointing out
this article comparing Oracle, PostgreSQL, MySQL from a getting started perspective. The author notes his predisposition to favor Oracle and calls it the gold standard. In his measure, PostgreSQL got the best score. It's worth a read as he has some valid points, both positive and negative on PgSQL.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on December 10, 2005
I have moved most of my pgsql.gavinroy.com site to
a new section on this site. This section includes my 2002 OSCON presentation, PgSQL artwork, a backup script, and some functions to take PgSQL query results and spit them out in XML using PDO.
Filed under: PostgreSQL |
Tagged with: Postgres |
Posted on April 27, 2004
I've been working with Andrew Dunstan for the past few days on getting
pgFoundry up and running on
GForge. We're now live and taking new projects. This is the official replacement for
Gborg, which has been the
PostgreSQL community projects site. It's pretty exciting, and I'll be moving most of my
PostgreSQL based open source projects there.
Filed under: PostgreSQL |
Tagged with: Postgres |