Domesticating applications, OpenBSD style
Domesticating applications, OpenBSD style
Posted Jul 23, 2015 21:59 UTC (Thu) by dlang (guest, #313)In reply to: Domesticating applications, OpenBSD style by mathstuf
Parent article: Domesticating applications, OpenBSD style
Fair question
different things have different uses.
The archive is to recreate anything else as needed and to provide an "authoritative source" in case of lawsuits. How long you keep the logs depends on your company policies, but 3-7 years are common numbers (contracts with your customers when doing SaaS may drive this)
being able to investigate breeches, or even just fraud are reasons for the security folks to care.
for outage investigations (root cause analysis), you want to have the logs from the systems for the timeframe of the outage (and this is not just the logs from the systems that were down, you want the logs from all other systems in case there are dependencies you need to track down). For this you don't need a huge timeframe, but being able to look at the logs during a time of similar load (which may be a week/month/year ago depending on your business) to see what's different may help.
by generating rates of logs of different categories you can spot trends in usage/load/etc
By categorizing the logs and storing them by category you can notice "hey, normally these logs are this size, but they were much larger during the time we had problems" and by doing it per type in addition to per server you can easily see if different servers are logging significantly differently when one is having problems.
Part of categorizing the logs can be normalizing them. If you parse the logs you can identify all 'login' messages from your different apps and extract the useful info from them and output a message that's the same format for all logins, no matter what the source. This makes it much easier to spot issues and alert on problems.
A good approach is what Marcus Ranum coined "Artificial Ignorance"
start with your full feed of logs, sort it to find the most common log messages, If they are significant categorize those longs and push them off for something that knows that category to report on.
Remember that the number of times that an insignificant thing happens can be significant, so generate a rate of insignificant events and push that off to be monitored.
repeat for the next most common log messages.
As you progress through this, you will very quickly get to the point where you start spotting log messages that indicate problems. Pass those logs to an Event Correlation engine to alert on them (and rate limit your alerts so you don't get 5000 pages)
Much faster than you imagine, you will get to the point that the remaining uncategorized logs are not that significant, but also that there aren't very many of them and you can do something like generate a daily/weekly report of the uncategorized messages and have someone eyeball them for oddities (and keep an eye out for new message types you should categorize)
This seems like a gigantic amount of work, but it actually scales well. The bigger your organization the more logs you have, but the number of different _types_ of logs that you have grows much slower than the total log volume.
> It seems that, to me, these log databases are larger than the actual meat of the data being manipulated in many cases.
That's very common, but it doesn't mean the log data isn't valuable. Remember that I'm talking about a SaaS type environment, not HPC. Even if the service is only being provided to your employees. HPC and scientific simulations use a lot of cpu and run through a lot of data, but they don't generate much in the way of log info.
For example, your bank records are actually very small (what's your balance, what transactions took place), but the log records of your banks systems are much larger because they need to record every time that you accessed the system and what you did (or what someone did with your userid). When you then add the need to keep track of what your admins are doing (to be able to show that they are NOT accessing your accounts and catch any who try), you end up with a large number of log messages for just routine housekeeping.
But text logs are small, and they compress well (xz compression is running ~100:1 for my logfiles), so it ends up being a lot easier to store the data than you initially think. If you are working to do this efficiently, you can also use cheap storage and end up finding that the amount of money you are spending on the logs is a trivial amount of your budget.
It doesn't take many problems solved, or frauds tracked down to pay for it (completely ignoring the value of logs in the case of lawsuits)