Postgres

Postgres

  • Join Planning

    • If the plan is good then query performance will be acceptable

    • Query might have 20 or more joins in it

  • Postgres - lots of great open source

    • Pgadmin - simple Postgres admin interface

    • Tons of plugin libraries

  • Performance

    • Multi-version concurrency control - multiple cores for concurrent queries

    • Non-blocking indexes - create and update indexes without screwing up reads and writes to update indexes

    • Partial Indexes - can index on a subset of of data

    • Datatypes - boolean, ranges (times), arrays, XML, JSONB, money, blob

    • Indexes - helps with performance - Btree, GIN, GIST, BRIN, Hash

    • Extensions - corn jobs, stored procedures, fuzzy matching, postGIS

Episode 496: Bruce Momjian on Multi-Version Concurrency Control in Postgres (MVCC)

www.se-radio.net/2022/01/episode-496-bruce-momjian-on-multi-version-concurrency-control-in-postgres-mvcc/

  • Stonebreaker built Postgres for extensibility and its come in handy a ton

  • I of ACID - Isolation

    • You interact with the database kind of like you are the only one interacting with the database

    • Static data is the easiest to work with; volatile data and multiple users is hard;

    • ACID capabilities make it easier to interact with the database

    • Isolation - I won't see other people's changes while I'm doing my own work

    • Complex stuff gets pushed into the database; all of the shared state within the database is easier to manage rather than having applications share everything with eachother

    • Isolation prevents you from seeing other people's changes while you are doing your own work

  • Locking as a solution?

    • One giant lock and prevent anyone from going in while I go in; that doesn't work super well; concurrency is terrible

    • 70-90s databases - approach was lets not lock everything - we will make the locks granular; you would lock a table at a time; when I was in it no one else could get into it;

    • Then they got to the point where they could lock pages; broke down data into pages - I will modify these pages and then

    • Some of the databases got to row-level locking; row that I'm modifying

      • First problem - huge amount of locking; don't really solve the concurrency problem; just chopping it into smaller pieces

      • Second problem - worse problem. Lock escalation - the database might not know what your intent was; if you locked a lot of rows - then it might lock the whole page - escalate the lock - you could imagine multiple people doing these things together and running into eachother

      • Early 90's and prior there was a huge issue of lock escalation; as job got bigger it started spilling out into other places

    • Informix - 1990s; I would run it at 8:30pm; need to guess how big the lock table to make sure you didn't overflow anything

      • So hard to track these locks

      • Databases became this ugly thing that was hard to deal with

  • If all we have is read then everyone can share; if everyone has read and write access its an issue

    • When you have a single copy of the row, as soon as one dev modifies it its not in the table anymore

    • All of a sudden - not only the writers were affected but also the readers

  • Multi-version concurrency control

    • Different method of doing database updates

    • The space saving way was to have one copy of the row; the way this paper wanted to solve it - it would create multiple versions of a single row; this seemed ludicrous; could have 5 copies of the same row in the database

    • The problem of someone coming to read data when someone is updating it; if we create a new version of the row with new data; we could allow the readers to read the old version of the row; at the same time a newer version of the row is being created (maybe committed/or not); makes me able to give snapshots to users in the database; problem of readers getting blocked by writers goes away

  • Snapshot

    • A record that gets created when you start your query; once you take that snapshot at the beginning, allows us to distinguish which of the multiple versions of the row should be available to you

    • Snapshot says which of 5 rows is visible to my transaction; maybe one or none of those is visible

  • Ids

    • Transaction - read only queries like Select - it doesn't typically get a transaction IDs;

  • Visibility

    • Who has visibility into the data at what time

    • If someone writes a 1 and someone else writes a 2 at the same time what do I see

    • It depends on when the query was started

    • Someone who started a transaction before me or after me potentially could see a different set of values

    • High write volume requirements -need to split apart visibility concept and let people see different things - that's what MVCC helps you do

  • What is the algorithm for determining which row to read or write if there are multiple versions of the same row

    • When you get your snapshot at the beginning, guarantee you see all committed transactions before your snapshot

    • Corrollary: Any work in progress or work that starts after your snapshot has started will not be visible to me

  • Readers never block writers; writers never block readers

    • Readers clearly don't block other readers

  • What about two writes at the same time?

    • Writers do block other writers

    • We have to know if previous transaction completes or not; need to update the most recent version of this row

    • Have to see newest version of that row; when you try to update or insert a row where a row has been inserted but not committed; we stop the insert or Update until the transaction commits or aborts; we get a lock on it

    • In some distributed databases you might have some situation were you don't lock

  • Transaction Isolation levels

    • Pre-committed - every new statement gets a new snapshot; every query in a multi-statement transaction

    • Repeatable read - all statements in a transaction gets one snapshot; all queries in transaction see a consistent query

    • Seriealizable - as you are running through multi-statement transaction you may read some rows (As they are updated); can't update though; will check to see if anything has been modified underneath you while you are running your multi-statement transaction

  • Dirty Read idea

    • Discard ACID requirements

    • I want to see data while its being modified; don't care about my snapshot

    • It used to be (pre-MVCC) - you would compute the number and just run it and have caveats about number accuracy

    • MVCC allows readers to read frequently - most of the time it is pretty accurate but obviously its based on snapshot at time of query

    • Atomicity - important to postgres users; wouldn't want to see changing data as transactions are processing

  • Extensibility of Postgres - how to add new data type with MVCC

    • A lot of databases have tried to add extensibility after the fact; Postgres was designed initially for it

    • Has a lot of system tables that store indexing tables, aggregates, stored procedure languages, custom data types and data types

    • Define input function, how long it will be, and output function - casting functions to move between data types - not super hard to set up; Postgres fits it into rows and puts transaction IDs on the front

  • Do most Enterprise databases support MVCC?

    • Oracle does - take old row and put it into an undo segment - pointers to different versions

    • Microsoft has it as an option - tried to add MVCC; hard to add retroactively add it; a lot of Microsoft SQL was used to locking procedure

    • DB2 has it available but not default

    • Cassandra uses something similar maybe; maybe for MySQL and MariaDB too

    • Postgres is a bit unusual - just leaves the rows in the table

  • When did Postgres add MVCC?

    • We added it in 2001/02

    • Postgres had the ideal of time travel - see database as of last week; you would keep old versions - allow you to access old versions of the row

    • Clean up of rows can be hard in high write systems

  • Clean Up process

    • Pruning - lightweight operation; removes old versions of the row; as you are doing a sequential scan - you are seeing the transaction IDs and see what was expired by what transaction IDs; if no one can see a specific row - the system will restructure the page and free up the page

    • Indexes - split two pages (generally expensive transaction) - there were a lot of dead rows in the index - didn't necessarily need to split - could have great benefits to postgres

    • Auto-Vaccum - wakes up every minute; frees up space in background; query won't really be working with

  • Garbage Collection is similar

    • C - postgres written in C; allocate everything and free everything

    • Perl - language counts references; frees memory when references are 0

    • Java case - allocate stuff on the fly; garbage collector looks at all objects and says which ones are visible and which ones have been thrown away

    • Oracle - more similar to Perl - undo segment where old rows go to; manage references to that a bit differently

  • Storage Space Re-use

    • Page pruning and autovaccum - taking data that is no longer useful and freeing it up; a page that was 90% full is now less; if pages at end of table are empty - can truncate it down and get rid of pages

    • Same with indexes too

    • We don't free up all potential space to operating system - if you delete every other row - we might not free up all of the rows; there is a manual command - vacuum full that could compress it down; but that locks the table so tough to do; a lot of maintenance operations

    • Can't unsplit a Btree page easily - lock it, create a new copy, and delete the old one

  • SQL developers

    • Generally need to know how they work

  • Write amplification

    • Struggle with Postgres

    • Because of MVCC - we issue somewhat more writes than other systems; we have the old and new versions within same page/table and age them out; garbage collection is issuing writes to storage

    • Hintbits - tells us whether transaction was avoided; we update that at various times

    • Freeze operations on wrapping for transaction id's

    • Its a bit more write heavy than other databases

    • Its getting better but hard to solve without adding other issues

    • MVCC comes with cost - hintbits, cleanup, freezing on wrap around

  • Momjian.us; pglife