All you need is PostgreSQL

55
All you need is PostgreSQL databases performance programming ebellani.github.io
via schonfinkel 5 days ago |
caches

Archive.org

Ghostarchive
19 comments

1. 11
  
  peter-leonov 5 days ago
  
  …and ClickHouse.
  1. 4
    
    seabre 5 days ago
    
    Yeah. ClickHouse has turned into the thing I reach for when I need OLAP.
  2. 2
    
    b--man 4 days ago
    
    only if you need ad hoc analytical queries. And even then, do try duckdb first.
    1. 1
      
      peter-leonov 3 days ago
      
      Makes sense. Why do you prefer DuckDB over ClickHouse?
      1. 2
        
        b--man 3 days ago
        
        DuckDB is based on top notch research from the folks behind cedardb (and others). See https://duckdb.org/why_duckdb#peer-reviewed-papers-and-thesis-works
        
        clickhouse seems basically to have grown out of the performance needs of a company, and it shows.
        
        1
        
        peter-leonov 2 days ago
        
        Fair point.
        
        and it shows
        
        Do you know concrete points I should be aware of before making my bet?
        
        1
        
        b--man 2 days ago
        
        clickhouse was not acid for a long while, basically returning control before ensuring durability. Transactions are still experimental. I'd not trust it for anything in production
        
        https://medium.com/@pranavmehta94/clickhouse-fsync-what-the-history-tells-us-61a370ec8a2d https://clickhouse.com/docs/guides/developer/transactional
        
        1
        
        peter-leonov edited 2 days ago
        
        anything in production
        
        What counts as production seems to vary a lot in this thread.
        
        This is enough production for me: https://www.google.com/search?q=clickhouse+site%3Ahttps%3A%2F%2Fblog.cloudflare.com%2F&ie=UTF-8&oe=UTF-8&hl=en-us&client=safari
        
        Thanks for sharing the fsync blog. I might be reading it wrong though.
        
        In other words, fsync was never intended to provide comprehensive durability guarantees—it addresses only a narrow class of failures. More importantly, ClickHouse successfully operated for over a decade without it. Its durability model has always relied on replication and distributed architecture, not synchronous disk flushes.
        
        1
        
        b--man 2 days ago
        
        just look at CH transaction page and count the caveats. If you care about consistency, I'd advise to stay away from that. YMMV
  3. 2
    
    veqq 5 days ago
    
    Why so? I know nothing about it.
    1. 3
      
      Verfeuil 4 days ago
      
      I use Clickhouse at work. It's very nice for handling large amount of data (i have hundred of billions of lines and i can query them in minutes to find patterns on hardware from 2012), but what i really like are :
      
      TTLs (you can auto expire rows, columns, in a dynamic way)
      
      Materialized view, both refreshable and classic  (you can transform data automatically, and deserialize json / do join when some data is inserted in a given table, and the data is appended to another table)
      
      The fact you can query URLs (e.g. csv files) and populate a table on a schedule
      
      Compression you can change, and choose the best compression for the task at hand
      1. 1
        
        peter-leonov 3 days ago
        
        Very true. I'd also add native support for clustering and/or sharding on all levels: SQL keywords, built-in efficient consensus, table layout.
  4. 2
    
    dark_grimoire 5 days ago
    
    Technically you also have TimescaleDB?
2. 5
  
  peter-leonov 5 days ago
  
  stored_procedures_everywhere.gif
  
  On the serious note, I expected to learn how Postgres' rich type set allows to store almost anything, or how its locking primitives remove the need in external lock system coordination, or how different isolation levels cover for a spectrum of concurrency heavy apps, or how to design and tune a table to allow cheap write heavy storage, or how Postgres CDC plugins enable transparent heterogeneous replication, or how embedded scripting allows to pull the client-server app inside out where latency is critical, or how… stop-token.
  1. 3
    
    andersmurphy 5 days ago
    
    There's only one correct isolation level.
    1. 2
      
      peter-leonov 3 days ago
      
      I feel you. In the everyday apps where sound changes are top priority this is true. But not every problem is a money transfer. A simple example would be multiplayer games with single row stat updates without blocking range slightly off analytics.
3. 1
  
  tomjakubowski edited 5 days ago
  
  That OLAP approach is pretty interesting. Basically you materialize the ledger sums the application needs ahead of time with triggers. One nice thing about an OLAP database (or something like DuckDB querying parquets) is you can run whatever stats you want on your data columns, relatively quickly, without needing to anticipate what intermediate data you'll need to compute it. This flexibility is valuable for business reporting and other analytical purposes.
  1. 3
    
    b--man 4 days ago
    
    ad hoc analytical queries is one of the few places where I advise people to reach for something else. And that would be DuckDB.
  2. 2
    
    peter-leonov 3 days ago
    
    Yep. OLAP is also giving better compression because of the on-disk table layout through per column compression codecs optimized for similar data being packed in long continuous vectors.