[Navigation Bar]  
 
 

    

[OpenSUSE powered]
[BUSH powered]
[vi powered]
[XML] [RSS]
Dinner Meeting Minutes
Networking, Learning and Working Together
 
  • June (tentative)
    Introduction to MONO
  • May (tentative)
    Introduction to OpenGL
  • April (tentative)
    Introduction to BUSH
  • March
    (tentative)
    Linux Job Hunting
  • February
    PostgreSQL Round Table
  • January
    (Canceled - Chairman Unavailable)
  • December
    (Canceled - Chairman Unavailable)
  • November
    (Canceled - Chairman Unavailable)
  • October
    (Canceled - Chairman Unavailable)
  • September
    (Canceled - Chairman Unavailable)
  • August
    (Annual Summer Retreat)
  • July
    (Canceled - Chairman Unavailable)
  • June
    (Canceled - Chairman Unavailable)
  • May
    (Canceled - Chairman Unavailable)
  • April
    (Canceled - Chairman Unavailable)
  • March
    (Canceled - Chairman Unavailable)
  • February
    P3P Privacy Policies
  • January
    (Java Applets, Canceled - Insufficient Interest)
  • December
    (Charity Christmas Party)
  • November
    Introduction to Atomic OS
  • October
    Introduction to JavaScript
  • September
    Introduction to SQL
  • August
    (Annual Summer Retreat)
  • July
    (Canceled - People Away)
  • June
    (Canceled - People Away)
  • May
    A System for Managing Game Entities
  • April
    (Canceled - Scheduling Problems)
  • March
    Thousands of Clients Per Server
  • February
    Third-Person Camera Navigation
  • January
    (No Meeting)
  • December
    (Charity Christmas Party)
  • September
    (Special Online Game Meeting)
  • August
    (Annual Summer Retreat)
  • July
    Introduction to Perl
  • June
    Introduction to PHP
  • May
    (Call For Linux Projects III)
  • April
    Simple DirectMedia Library
  • March
    (Special Client Meeting)
  • February
    (Special Client Meeting)
  • January
    (Special Client Meeting)
  • December
    (Charity Christmas Party)
  • November
    (No Minutes Taken)
  • October
    (Canceled)
  • September
    (Special Employee Conduct Policy Meeting)
  • August
    (Annual Summer Retreat)
  • June
    Survey of Web Development Tools
  • May
    Chris Crawford on Game Design
  • April
    Logical Reasoning II
  • March
    Logical Reasoning
 

Live in southern Ontario? Attending dinner meetings is free.

PegaSoft's February Dinner Meeting

Date: February 7, 2008
Location: Linux Caffe


Attendance

Ken B, Chris B, Colin M, Dave G, Alex B, Darko H, Abdullah H, Amanda
Mike H canceled due to medical reasons.
Matt H had something come up at work and was unable to attend.

Personal Introductions

Due to the number of new people, each person introduced themselves.

General Talk

IT 360 Toronto Expo (Colin M)

Colin will be presenting a talk at IT 360 on the
subject of "Open Source Hidden Gems".  He solicited ideas for his talk.  Chris
B reminded him that the topics should fit the business-related nature of IT 360.

Alex B suggested the screen command.

Chris B suggested cfengine (system management) and spread (multicasting).

Snort (a system intrusion detecting system) was also suggested.

Business Shell (Ken B)

Ken presented his open source Business Shell (Bush) to
his company.  It was well-received and he hoped to advance to proof-of-concept
tests.  Bush reduces development and maintenance costs of complex, web-based
systems by using software engineering principles.  Bush is based on a subset
of the Ada language called "AdaScript".

Chris B half-seriously discussed adopting Ada at Afilias.  Such a massive
migration to a new language was unlikely, but Ada's software engineering
approach had merit for Afilias.

Chris B went on to explain Ada.  The original Ada 83 was a large language
with strict, strong typing and bloated compilers.  Many developers at that
time developed preconceptions against Ada.  Ken B acknowledged the problems
of Ada 83, particularly the lack of objects, but his audience at work was not
familiar with Ada 83 and it wasn't necessary for him to discuss the distant
past.

Chris B pointed out that Ada always admitted that it was a large, complex
language.  In today's world, the reality is that C++, with its STL and other
libraries, is even larger and more complex than the original Ada, a fact that
anti-Ada developers are reluctant to admit.

It was Chris' opinion that GCC Ada is a modern and robust compiler, "production
worthy" for businesses.

Alex B argued that--as a Perl specialist--avoiding unnecessary keystrokes is
essential for productivity and he believed that a "verbose" language like
Ada reduced his ability to do his work.  Ken B pointed out that Bush has a
number of short cuts for small scripts, but as systems grow larger and more
complex, reducing keystrokes is less important than reducing time on design
and maintenance.  Ken addressed this issue with his talk to his company.

Further discussion was postponed to focus on the Round Table.


Open PostgreSQL Round Table (Chris B)

Chris B suggested taking turns around the table in order to give everyone
an equal opportunity to ask questions.


Chris B: The Slony Replication System

Chris B began by talking about Slony.

Chris is one of the developers of Slony, a single master to multiple slaves
replication system for PostgreSQL.  Multiple master replication is a bit of
a "holy grail" in replication: although it improves reliability, it doesn't
significantly impact performance.  The replication system is currently used
full-time at Afilias to backup top-level DNS domains.

Slony is an add-on to PostgreSQL.  If replication was part of the PostgreSQL
core, the database software version would create replication issues (that is,
if one database was PostgreSQL 8.2 and another 8.3, for example).  Slony has
to be maintained as a separate project.

Replication may not be an issue if you can dump and restore databases quickly.
For companies with large amounts of data and strong uptime requirements
(that is, SLA - "service level agreements"), replication is mandatory.

Slony was based on the trigger-based "ER server" proof-of-concept project.


Ken B: What were the significant differences between MySQL and PostgreSQL?

Chris B quoted an X Windows designer, saying that there is no such thing as an
intuitive interface: all things are learned and developers are biased by what
they are familiar with.

In its early days of web development, MySQL was easier for ISPs to use than
PostgreSQL.  Features such as advanced SQL and integrity checks (foreign keys)
were not required.  The engine was thought to be a way to store data and
it was the programmer's responsibility to validate the data before giving it
to a database.  However, MySQL had the ability to store databases in different
directories (e.g. one per user), which was appealing to ISPs, but PostgreSQL
didn't get that feature until version 8.

MySQL is maintained by a commercial company and has a commercially-driven
mandate: if a company needs something, or if a new feature is interesting,
it will be added without integrating it into the design.  In the same way,
MySQL has 6 engines, including InnoDB (owned by Oracle), Solid (bought by
IBM (?)), Falcon (built by a designer of Interbase hired by MySQL), MySQL's
MaxDB product (formerly ADABASE).  Each engine has different semantics and
works differently.  This causes problems like consistent transactions across
the different engines.  There is no design consistency or integrity.  MySQL
is treated as a product.

PostgreSQL, on the other hand, is a community project maintained by more than
150 active developers where the public contributes ideas and changes.  New
features are discussed and debated.  Innovators are forced to argue their
designs and convince others with different perspectives, meaning that ideas
must be widely accepted and valid over many different environments.  Designs
must be orthogonal, of good quality and be coherent to get approved.  Although
PostgreSQL is more complex and has more functionality than MySQL, that learning
curve is reduced because of the consistent design.  There are less exceptions
or special cases and things intuitively work they way they should.

There is a core committee of 10 people who identifying bugs,
release security announcements and prepare releases.  PostgreSQL 8.3 was
delayed several weeks (from December to this week) due to a security issue
that was discovered and repaired.

Unlike MySQL, PostgreSQL has a single, unified database engine.

PostgreSQL has advanced SQL features, such as the ability to generate an
table index using a function.  For example, using the Soundex algorithm
(Wikipedia) to index names by the way they sound.  This can be more efficient
than calling a Soundex function directly in a query.

PostgreSQL transactions are implemented using a system called multi-version
concurrency control.  When a query changes a row, the original row remains
in the table and a new row is created.  They are differentiated by version
identifiers.  Old versions of tuples (that is, table rows) are left
for queries started prior to a transaction.  Newer queries use the rows with
the newer version number.  The result is very efficient transactions that use
no row locking, ideal for massive, slow operations like producing reports.

MySQL, on the other hand, uses row locking.  Since it locks each row, large
transactions block users from accessing the database for long periods of time.

While PostgreSQL is open source and developers can extend PostgreSQL with
commercial products, MySQL strongly discourages developers to develop
commercial add-ons for the free version of their software.  "It is never
wrong to buy a license," as the expression goes.  Developers are pushed towards
buying the non-GPL, commercial version.

There are many more books on MySQL but most of them are opportunistic and of
poor quality.  The books on PostgreSQL are usually better written, although
fewer in number, and remain on the shelves of bookstores longer because of
their lasting value.

Givex, the company the manages gift cards for businesses, is an example of a company using PostgreSQL.


Ken B: How does PostgreSQL handle large tables (100 million rows or more)?

PostgreSQL breaks up tables into multiple files, each file with a size cap of
1 gigabyte.  So a 100 million row table with 50 bytes per row would be stored
in about 5 files.

Darko H asked how massive updates to such a table compared with static data.

Chris B explained that it depended on the version of PostgreSQL.  The
transaction system requires VACUUM, a garbage collection command, to clean
up old rows left over from transactions.

  Version 7.1 (many years ago): vacuum used table locking
  Version 7.2 : table locking was removed but vacuum was slow
  Version 7.4 : improvements were made to caches and buffers affected by
                vacuum
  Version 8.0 : greatly improved buffer management (buffer data was no longer
                invalidated by the vacuum) but vacuum could only vacuum
                rows older than the vacuum command itself
  Version 8.2 : vacuum can clean obsolete rows even if they are newer than the
                vacuum
  Version 8.3 (newest): vacuum optimizations for large tables with no indexes

If the 5 gigabyte table was updated frequently, a lot of obsolete rows are
generated.  In 8.2, vacuum may have to run continually (starting a new one as
soon as the last one completed) and, even then, it may not keep up with the
volume on a full-time online system.  However, in 8.3, such a table is cleaned
in a reasonable time due to the new optimizations (provided there are no
indexes on the table).

This is an area of continual innovation: look for further improvements in the
upcoming releases of PostgreSQL, including vacuuming large tables with indexes.

Afilias has large tables like this, but they tend to be less rows and more
columns (e.g. user contact and address information).

Ultimately, a table with such a large number of rows should be partitioned
for better performance.


Alex B: Explain table partitioning

Partitioning refers to breaking up tables into smaller tables (for example,
dividing a large table into 12 separate tables, one for each month of the year).

Alex B pointed out Google split tables across hundreds of servers and has one
master server which initiates a query simultaneously on all servers and
collates the results.

Chris B said that a company had released a similar product for PostgreSQL
which divides up tables across 8 nodes so that each node has 1/8 of the data.


Abdullah H: How does MySQL compare for handling large tables?

The best bet on MySQL, because of the locking transactions, is to stick with
the MyISAM engine for the example table of 5 gigabytes.  MySQL does not have
transactions and will not guarantee relational integrity: that is, it is not
possible to have a perfect snapshot of the database at any one time.  However,
MyISAM can update the column data "in place" quickly without generating new
rows or locks.

Dark H asked if nightly downtime had an effect compared to full-time online.

Chris B pointed out that 100 million rows takes a long time to dump and
reload, even if the site is down for part of the night.


Chris B: Explaining table spaces

PostgreSQL can use table spaces.  This refers to PostgreSQL talking control
of a partition (or file system) for exclusive use for storage of tables for
faster access.  Put high volume tables on fast drives with table spaces for
fast access.

Abdullah A asked about using RAM drives.

Chris B said that PostgreSQL doesn't handle massive table corruption well,
such as losing several tables in a database due to a system failure or
power outage.  However, RAM drives (How-To) are very useful for quickly
testing software that uses PostgreSQL.


Alex B: How does a system monitoring program (e.g. Nagios) work with Slony?

The best method of testing is keeping your tests as few as possible but choose
tests that give a complete picture.

Slony has a database view which shows the current replication status, including
delays in replication, on the master node.  This can be used to graph the
current status and verify that the databases are alive.

Afilias also uses an "application test": running several common business
processes every 5 minutes, such as creating and destroying a DNS domain.  This
narrows the point of failure to a 5 minute interval.

Together, the two tests ensure both that the databases are alive and the
business functionality is working correctly (at least, internally--it doesn't
verify Internet access to the top level domains).


Ken B: What about the Firebird database which was announced a few years ago?

Interbase was originally created at Novell and then was folded into Borland.
It may have originally been developed on VMS or UNIX.  It was a good, low
maintenance database, although somewhat dated.  Borland decided to release it
as open source.  Unfortunately, they didn't check for backdoors: the source
code revealed many security vulnerabilities which Borland had to patch for all
existing Interbase customers.  Borland became reluctant to support the open
source version.

A community of developers took over the code and it became "Firebird".  There
are perhaps 30 active developers on Firebird.  The original Interbase designer
was heading the project but was hired away by MySQL.

Firebird has stored procedures, triggers, relational integrity, a small
footprint and, of course, Interbase compatibility.  However, the code is old
and support is limited compared to MySQL or PostgreSQL.

Chris B mentioned Btrieve, a Novell database used with many of their products.
It was basically an ISAM (Wikipedia) engine.  When SQL support was added, Btrieve became
"Pervasive".  Developers took PostgreSQL, put emulation on it that allowed it
to connect to Btrieve, etc.  This version of Btrieve, for a while, was
PostgreSQL.


Chris B: A Toronto PostgreSQL Users Group

Chris B was interested in starting a PostgreSQL users group.
He didn't have time to be the "point man" but knew of many people who would
be interested in joining.  User group popularity is growing and there is a
large PostgreSQL community in Toronto.  Colin M and Ken B offered to help
promote this project.  Chris B requested email addresses for anyone who might
be interested in joining such a group, if it was formed.

Chris B will be speaking at PGCON PostgreSQL conference in May in Ottawa.


Next Dinner

Unless otherwise notified, the next PegaSoft dinner meeting is
Thursday, February 21, 2008.  Location: Linux Caffe  Speaker: Chris F. A.
Johnson
 
     

« Truth Humility Communication Nobility Freedom Purity Excellence Right Support Courage Compassion Quality Honesty Trust Cooperation Challenge Education »
PegaSoft Canada - A Linux Association Since 1994