PgFirstAid: PostgreSQL function for improving stability and performance

115 points by yakshaving_jgt 2 days ago

Hey everyone! Justin here (randoneering). Thanks for posting a link to the github repo!

I will try to sum up answers to your questions by suggesting you look at my roadmap in the repo on where it will be heading. I only included a few health checks at the start because I wanted to keep milestones reachable. And, because I am working full time, three kids, sole provider, and try not to spend all my time in front of a screen. So I will be adding more as we go along the way.

Regarding adding this as a view-yup, that's on the list. Same with having this export directly to a table.

The intention of this function is to avoid being overly complex for general users, but allow for folks who want to dive into the details to be able to do so as well. Brent and his team have some great "filter" options with sp_blitz that give you all the info you could ask for (bring the pain) or the general health check. That is one of my favorite features and want to implement something similar. The view would help make this easier!

I have a pr that will be merged shortly to avoid the issue with folks naming their tables with special characters, capital letters, etc. Should have figured this would come up but was on my list to handle in the next wave of work. But seeing as this escalated very quickly...looks like it will be merged today pending testing against pg15-18.

Just want to say THANK YOU for your comments, suggestions, and feedback. I look forward to moving this along with the community!

rom16384 2 days ago

You may also want to try check_postgres [1] and pg_insights [2]

[1] https://bucardo.org/check_postgres/

[2] https://github.com/lob/pg_insights

sumibi 2 days ago

This is inspired by the first responder kit from Brent Ozar for SQL Server, which is an amazing tool for accidental DBAs that do not know where to start optimizing.

I'm looking forward to trying this out on my postgres databases.

rapfaria 2 days ago

Damn, those are some 10k loc sql files, 3k+ pull requests. Talk about battle-tested.
Hopefully this grows to be postgres equivalent

LunaSea 2 days ago

I would disagree on the fact that a table without a primary key is a critical problem.

There are multiple reasons for tables not having primary keys. Log tables are one example.

Excessive sequential scans is also not a problem for small tables.

davidw 2 days ago

This looks like it's targeted at finding some obvious things, and if you know your table doesn't need a primary key, you could always exclude it from the report.
1a527dd5 2 days ago

Logical replication requires a primary key. We found that out the bad way. It _is_ a critical problem.
- ahachete 2 days ago
  
  Logical replication does NOT require a primary key. It requires either a primary key, a unique index or to define a replica identity.
  Sure, that still boils down, in most cases, to having a PK (replica identity is normally not a good idea), but there are cases where this would not be the case.
somat 2 days ago
Their are many good reasons to always have a primary key, even if it is just an automatic serial number, but the one that hit me personally is that it is surprisingly difficult to deduplicate a relational database.
When I was first learning SQL I was pretty firmly in the "use natural keys" department. And when the natural key was every single column I would go "whats the point?" shrug and have no primary key. Until I started getting duplicated rows
```
    insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
    insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
```
Duplicate rows a. tend to mess up your query results and b. are surprisingly difficult to remove. If I remember correctly after spending far too long trying to find a pure sql solution I ended up writing a program that would find the duplicates, delete them(all of them as there is no way to delete all but one) then re insert them. and adding that missing primary key.
I still like natural keys more than I probably should. (you still need a key to prevent functional duplicates, even when using a surrogate key, why not cut out the middle man?) But am no longer so militant about it(mainly because it makes having dependent tables a pain)
- paulryanrogers 2 days ago
  
  I'm a fan of always including unique indexes in the DB, even if it must exclude soft deleted rows. At a minimum it can keep functional duplicate out. Those seem especially insidious when there are races.
- ahachete 2 days ago
  
  Using natural keys is what actually can prevent duplicate rows. In your above example, if email is the PK, there would be no duplicates. But adding an id as a PK would essentially keep your database with duplicates:
  (1, 'bob', 'bob@bobco.com')
  (2, 'bob', 'bob@bobco.com')
PunchyHamster 2 days ago

Log tables with pkey on date can be searched faster in typical log table use of "tell me what happened in this time range", tho of course you have to make it unique
- s1mplicissimus 2 days ago
  
  I'd rather logs not fail because for some weird reason the date of 2 records is exactly the same. Time savings adjustments / clock drift correction sound like the most obvious candidate to produce such a case. Granted, chances are not high, but I'd usually prefer knowing that the chance of it failing for this reason is 0.
- saurik 2 days ago
  
  1) But with quite a hefty penalty on writes... I'd think you would be better off without a primary key and just using a BRIN index for that use case?
  2) Even if you did really want that B-tree, you can still have it and not have to have to awkwardly make it unique if you don't make it a "primary" key.

ahachete 2 days ago

Does the unused index feature look into replicas? (I guess it doesn't) It's risky to delete an unused index by looking at a single instance, since it may be used by other read replicas.

evanelias 2 days ago

> Unused Large Indexes - Indexes consuming significant disk space but never used (>10MB, 0 scans)

Is this a typo? I would think that 10MB seems ridiculously small for a threshold here.

RedShift1 2 days ago

Why are indexes on foreign keys required? If I'm doing a join, it's going to select the primary key of the other table, how will an index on the foreign key help?

1a527dd5 2 days ago

If you care about performance, they are required.
https://dev.to/jbranchaud/beware-the-missing-foreign-key-ind...
Again, another thing we learnt the hard way. All FKs now require a index for us.
formerly_proven 2 days ago

Referential integrity checks by the DB engine (e.g. when deleting from the foreign table) require reverse look-ups of foreign keys, which would necessarily become full table scans without an index. Apart from that, applications also often do look-ups like this.
magicalhippo a day ago

I'm very curios what sort of work you do with databases to even ask this question.
Clearly you don't have the kind of "load all invoice lines belonging to this invoice" type workloads that I'm used to. Is it all OLAP?
- RedShift1 12 hours ago
  
  Most of my experience is with MySQL/MariaDB which created foreign key indexes behind your back.

netcraft 2 days ago

Very nice!

Did you consider making this a view instead? Just curious if there is a reason why you couldn't.

singron 2 days ago

I'm not the author, but I think you could by using UNION ALL instead of temp tables. You could also make a view that just calls this function. I'm not sure why it would matter though.