Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
You Should Add Debug Views to Your DB (chrispenner.ca)
72 points by ezekg 12 hours ago | hide | past | favorite | 31 comments




To help understand why a complex query with many joins comes back empty, I like to provide helper debug functions that builds the query step by step. Each block checks one prerequisite and tells the user what’s missing:

```sql FUNCTION debug_user(user_id):

  IF NOT (SELECT FROM user WHERE user.user_id = user_id) THEN
    -- user doesn’t exist
  END IF;

  IF NOT (SELECT FROM user
           JOIN user_addr ON user_addr.user_id = user.user_id
           WHERE user.user_id = user_id) THEN
    -- user has no address
  END IF;

  -- keep adding joins until you find the break
```

because when you have a query that involves 6 relations, and you don't get results, it is nice to know why.


I do the "same thing", but using PostgreSQL `EXPLAIN ANALYZE`. EXPLAIN ANALYZE has information about the number of rows returned, which means I know exactly which node in the query plan failed to return expected values.

Not for function though unless you specify it as a debug flag inside the function?

don't mean to push the goalpost (didn't mention the following in the above post)

these functions can also be exposed in admin ui's making it easier for the general admin users/support users.

another very useful use case is when row level security (RLS) is enabled.


You can can also run a detailed explain plan and with a suitable visualizer it’s trivial to see where the query fell apart. You get to see the actual data distribution this way.

If you build your queries with CTEs, it is easy to pull them out in a console and query each step separately to find the issue.

now we're talking

Counterpoint: you should not be connecting to your production database, you should not be running non-critical queries on production database servers, and you probably shouldn’t have permission to see all this data about your users.

Obviously your mileage may vary, your scale is your own, your trade offs are your own trade offs.

But be aware that there comes an operational scale where this is not an acceptable way - operationally, legally, privacy-wise - to investigate customer issues, and you’ll need different tricks.


Counter-counter point: for most databases that “operational scale” will never come.

Scale isn’t just measured in transactions per second - it’s also measured in dollars, and compliance risk, and legal exposure.

What are the tricks for investigating customers' data that don't violate privacy?

https://postgresql-anonymizer.readthedocs.io/en/stable/ can be helpful, though I don't know if it scales to the parents standards.

https://www.xkcd.com/1737

If you're OBVIOUSLY not the target audience you don't have to dismiss it because it doesn't fit your usecase. There's probably a thousand "apps" where this is just fine for every one "Sry we work with the government or are planet scale apps" you're talking about.

It's exhausting to read dismissive online dick-measuring comments, if you have the issues you're explaining you already know this doesn't apply to you. It's on the same level as "Bro I asked a question to an LLM and it gave an interesting answer and I'm unique because nobody but me can ask questions to LLMs like I can" style posts.


I don’t think I was being dismissive, I was just pointing out the lack of universal applicability of this suggestion.

It is my experience that many people do not realize that it is possible not to have developers just connect to prod databases with admin privs.

Pointing out that there comes a point where this sort of approach isn’t the norm is part of how people who reach that level of scale learn that. https://xkcd.com/1053/

And that level of concern isn’t reserved for planet-scale - once you have a couple of million dollar contracts on your B2B SaaS platform you should be taking production data ops seriously enough that this sort of approach is unlikely to make sense.

And I shouldn’t need to say that user privacy ought to be a concern even for small operations.


> It is my experience that many people do not realize that it is possible not to have developers just connect to prod databases with admin privs.

Dismissive, everyone knows this but they probably can't be arsed/don't care

> Pointing out that there comes a point where this sort of approach isn’t the norm is part of how people who reach that level of scale learn that. https://xkcd.com/1053/

Not everyone has these ambitions

> And that level of concern isn’t reserved for planet-scale - once you have a couple of million dollar contracts on your B2B SaaS platform you should be taking production data ops seriously enough that this sort of approach is unlikely to make sense.

Sure, but you're talking about "seriousness" with the same dismissive "I'm better" tone here again, your usecase and the business you work for doesn't reflect what everyone else is doing

> And I shouldn’t need to say that user privacy ought to be a concern even for small operations.

Depends a lot on what PII you're collecting. But rather than stating "You shouldn't collect PII you don't need" since I don't know your usecase I'll say "I try to minimize the PII I collect so I don't have to deal with these issues yet".


This is not unreasonable, and will be standard practice at many large companies. Judging from downvotes some of us are just too cool for that.

I connect to a production replica read-only. Many coworkers aren't even allowed that. Any DDL change has to go thru reviews & approvals etc. and is too much trouble, so I just keep a set of queries in git.

Also, any defined view in the db becomes a dependency that people are scared of breaking because who is using it and for what? This becomes especially true when random things connect to your db (bad but too late to do anything about it...) Now you can't change it without everyone yelling at you, and worse yet, a necessary data migration that affects such views means you have to fold them into your data migration project.

Working on giant corporate legacy systems is painful but it pays the bills...


The next logical step is to realize that using these views to drive your actual application has great potential to both vastly simplify the logic and make it super high performance. You can even take this so far as to make a view that spits out XML (or I guess json): you basically get SSR for free with easy to inspect/compose queries.

I implemented something similar for a 3M/day ad tech platform. We created a few materialized views for my boss, myself, and the DevOps team to monitor, instead of querying over 100 tables. I used stored procedures in PostgreSQL, which made it fast, efficient, and non-blocking also not to mention we avoided setting up complex Grafana/UI in admin dashboard, boss: i want to know x, y ,z, okay -> tableplus, -> export csv boom!

I'm trying to work out why 'views can help you see whats going on in your database' has got to the front page of HN. Is it that we've had a whole generation grow up with ORMs and so only people over 50 know how to actually use a RDBMS?

This is one thing I liked about my uni course back in the day. One of the assignments was to build an entire application within SQL (using triggers, stored procedures, etc) and very minimal bindings to a skeleton written by the professor. It's impressive how much you can do with just Postgres or MySQL

Not over 50, but yes. People are deathly afraid of using RDBMS features, like triggers, events, etc.

"Obviously querying over project_shorthand = '@unison/cloud' isn't going to be able to use an index, so isn't going to be the most performant query"

If you know you're going to be querying on username and project shorthand a lot you're just a handful more lines away from instantiating that as a database function which can take your shorthand apart with string functions and get back to a high-performance query. Functions can return rowsets which you can then further filter with WHERE clauses.

Although in that case I think I'd just take the first and second parameters separately anyhow, since

    debug_contribution("sophie", "@unison/cloud")
and

    debug_contributions("sophie", "unison", "cloud")
isn't a big enough difference to be worth writing string-parsing code. But do as you like.

If you're using it regularly, you can make it a derived column and index it in most RDBSs. Then it'll work with predicate push-down and you can do other fancy things like store it on alternative partitions to optimize block reads when you're not using it.

The other obvious benefit is that it is no different in semantics than any other column, so there's no advance knowledge necessary for other users vs using a function.


I was very confused about the term "Debug" view. We just called them views and used them normally. The main challenge was predicate pushdown which, though usually worked, was somewhat dependent on how smart the optimizer was. Also, things could get tricky when you start building views on top of views. Soon you are not certain why a simple looking query selecting from a simple looking view became slow until you see that it is views all the way down. As with most things, balance is the key.

[deleted]


I get the sentiment but it is an odd one that we'd consider adding a name to a list of tables (that can be sorted and filtered) that we generally don't look at as much as our code* as more clutter than adding a SQL to our codebase.

* I realize this is not true for everyone


Or write the query as a function in code, that way, everyone can know it exists, call it easily, and change it along with any other changes to your logic or schema.

So now you have to launch a special debug instance of your app that doesn't accept prod traffic, but can still connect to the prod DB, so you can shell into it and call the function?

You can and should store SQL queries in VCS. Everyone can still know that they're there.


I do the same but it typically only stays my personal sphere or documented until you get to something like retool/admin ui as a common place.

I agree. So I did. And over the span of a year I was the only one that used them. ¯\_(ツ)_/¯

I address that problem by scheduling a brown bag lunch and inviting devs from related teams to join in so I can present my cool new tools and techniques to everyone. Sometimes it's wasted effort, but sometimes it does result in wider usage

This plus some automated assertion checks on whatever invariants you can think of makes for a very cheap and scalable canary.

See an alert? Just query the view and you should have all the client/job/whatever IDs you need to trace the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: