Show HN: Dassana. JSON-native,schema-less logging solution built atop ClickHouse

the-alchemist · on April 21, 2022

* Who do you see as your competition? AWS's CloudWatch / Centralized Logging? Splunk? GCP's Logging? Logstash? Graylog?

* What kind of query language are you thinking? I imagine SQL-like, as that's Clickhouse's native language.

* Business-wise, how are you gonna integrate with the cloud providers, AWS / GCP / Azure? Most people who use those services just use the built-ins.

* More than Grafana, I think you need something like Metabase integrated OOTB. That might be a killer feature.

* IMHO, FTS is a must-have from day 1. Most software that folks run produce non-structured logs OOTB (sad, I know), so folks won't even be able to try your service without changing their software. And getting a lot of software, even popular ones like Python/Flask, Ruby/Rails, Java/Spring, to produce structured logs is not a simple task.

Best of luck!!

gauravphoenix · on April 21, 2022

>Who do you see as your competition? AWS's CloudWatch / Centralized Logging? Splunk? GCP's Logging? Logstash? Graylog?

More like Athena/Presto/SnowFlake. Simply put, anyone who is DB like systems for querying structured logs.

>What kind of query language are you thinking? I imagine SQL-like, as that's Clickhouse's native language.

Pretty much CH like SQL with some syntactic sugar for JSON. https://docs.dassana.cloud/docs/query/sample-queries#filter-...

>Business-wise, how are you gonna integrate with the cloud providers, AWS / GCP / Azure? Most people who use those services just use the built-ins.

Cheaper, faster and easier. Let me openly challenge anyone- take a nested JSON, send it to logging service and query it. And now do the same with Dassana. You will find night and day difference. I agree with most folks start with in-built services but they soon grow out of it. Here is an example- if you are using GCP, try getting count of failed HTTP requests group by host or ip. Turns out, there is no support for aggregate queries and you will have to create bunch of complicated metric filters to achieve it.

>More than Grafana, I think you need something like Metabase integrated OOTB. That might be a killer feature.

That's an awesome suggestion, we are going to look into it

>IMHO, FTS is a must-have from day 1. Most software that folks run produce non-structured logs OOTB (sad, I know), so folks won't even be able to try your service without changing their software. And getting a lot of software, even popular ones like Python/Flask, Ruby/Rails, Java/Spring, to produce structured logs is not a simple task.

Agree with your sentiment,we for now are focusing on use cases where you have structured logs. SecOps teams have such use cases. These teams mostly deal with data like CloutTrail, VpcFlows, ALB logs etc.

seektable · on April 23, 2022

> * More than Grafana, I think you need something like Metabase integrated OOTB. That might be a killer feature.

Nowadays you can directly connect to CH from many BI tools, and this could be different choice depending on report types / personal preferences. For example, our SeekTable has built-in connector for ClickHouse, and support 2 different drivers - one for binary TCP interface, another one for HTTP(S) interface.

mritchie712 · on April 21, 2022

Are you using the new JSON column type released in clickhouse 22.3?

https://clickhouse.com/blog/clickhouse-22-3-lts-released/

gauravphoenix · on April 21, 2022

Not yet. It is quite a bit rough around the edges and far from production use. Besides, it will require creation of multiple tables/column for each schema type (i.e. github schema might conflict with gitlab schema). As such, we decided to flatten the JSONs and use map data type and have our own SQL layer that translates the queries to underlying ClickHouse queries. This allows us to add a lot of syntactic sugar e.g. https://docs.dassana.cloud/docs/query/intro

We might start using that feature in future.

magundu · on April 22, 2022

What do you mean by flatten the JSON? How it is stored in clickhouse?

gauravphoenix · on April 23, 2022

We flatten it to a map and store the map. When you query, we dynamically generate ClickHouse query that queries the map.

magundu · on April 25, 2022

If the JSON is nested, how do you put it the map?

Since map key is one type, how do you handle multiple types of values?

gauravphoenix · on April 25, 2022

we flatten nested JSONs too. Not sure what you meant by second question, can you rephrase or provide an example? Also, feel free to drop me an email or join slack[1]- it will be easier to discuss tech details there.

[1] https://dassanacommunity.slack.com/join/shared_invite/zt-teo...

peapod91 · on April 22, 2022

How do you compare to https://betterstack.com/logtail which also seems to be built on Clickhouse?

gauravphoenix · on April 22, 2022

A few differences:

- Even though logtail is quite cheap, we are even cheaper solution - Our pricing model separates ingestion from query. If you don't query, you don't pay for query. Just ingestion. - We are JSON native. Our SQL allows querying JSON fields that are nested under nested JSON arrays. - Performance. We believe our solution is much faster for selective queries. This like most performance claims, it all just depends on the data shape, volume and what you are querying.

Similarities: - We both use ClickHouse as underlying DB.

shaeqahmed · on April 22, 2022

Cool product and pricing model

> Cloud Log Lake

That's the first time I'm hearing a Clickhouse backend described as a lake. Care to explain?

gauravphoenix · on April 22, 2022

Generally speaking, tech like ClickHouse is considered a data warehouse tech. But we are using it as data lake tech: there are no schemas involved in Dassana. This means that you can send free form JSON objects to Dassana and query them using SQL.

pachico · on April 21, 2022

I'm wondering if all these logging solutions that don't offer traces have any kind of future.

gauravphoenix · on April 21, 2022

Depends- there is plenty of market for purely structured logs query. I come from security background and these days in cloud security, all logs are structured: CloudTrail,VpcFlows,ALB logs etc. We are going to be focusing on such use cases.