I suspect I already know the answer (as in "no"), but does collectl provide any way to investigate inode consumption/exhaustion issues, i.e. help identify the user/process that allocates file handles over time? If so could someone please provide an example of how to do this? / SenseiC bows out
First off I realize that inode (at least from the documentation) statistics may have some "growing pains", but the example provided in the verbose data section of the documentation (http://collectl.sourceforge.net/Data-verbose.html) shows the following example: Inodes/Filesystem, collectl -si # INODE SUMMARY # Dentries File Handles Inodes # Number Unused Alloc MaxPct Number 40585 39442 576 0.17 38348 but actually if you issue a collectl -si command you don't get that: # collectl --version collectl...
oh thanks for your reaserch on this Mark. we are moving from graphite to telegraf system. New dashboards should also show available memory. will check with plugins. THanks . i understood its not collectl output..
Are you sure that's comming from collectl? I'm thinking not. In fact I don't see 'available' memory in /proc/meminfo on my ubutu system, though it is an older kernel. 3.13. I remember way back when I first wrote collectl this was a lot of controversy in the way linux reported memmory usage and at one point I tried to see if I could add everything up and come up with the same amount of memory in the system. I never could and still don't know if one can. Exactly what is available memory? I do see free...
thinking more about this, if the speed it set to -1, isn't that a bug OR do you know if this is a known behavior? I do check first to see if the device is in an UP state and ignore if it isn't. Is this in fact an operational network? And if so, what is the meaning of speed? If the problem is linux can't determine the speed, perhaps the driver doesn't report it, I think the simple fix is for collectl to just set the speed to '??' which it will do when the speed isn't reported. to do this, if you want...
thanks, sebastian, sendign an email to me might be best as I hadn't known this was a problem. I have seen the invalid argument before and always wondered what was going on and how to deal with it. Something I just now discovered on my own system is I can 'more' the file but can't 'cat' it. I've never seen a negative value returned but can easily check. Clearly needs a little more digging on my part but I will get this into the next version. Is if ok if I send you a prerelease to try out? Just send...
Hello, I have filled a bug report here https://bugs.launchpad.net/ubuntu/+source/collectl/+bug/1709589 about the fact that collectl can't get the correct network statistics where the /sys/class/net/INTERFACENAME/speed value is set to -1. In this case the collectl process outputs the following: Bogus data record skipped for NET:ens3: data on 20170809 at 08:45:13 because the max speed is set to -250 On ubuntu 14.04 or older versions the speed value can't be read so this may be the reason why it works...
I think your suggestion with -P --rawtoo works for me. Thank you! – fany
OK, so I can probably put a script in to grab this data, but it's not ideal. Is this something that can be added in the future? Or is there a way to request this? I know this value is only available in redhat since version 7.. Thanks
OK thanks Mark. I'll see what I can do.
sorry, but I'm not reporting that. I do remember way back when I first wrote collectl there were a lot of question around accounting for memory usage because the total never seemed to equal all the different pieces. I also remember being very careful not to report redundant data and I think I felt that given total memory and various other pieces not being sure what value available memory even had. that said, MemAvailable IS recorded in the raw log file and you can either grep for it, though then...
Hi all Is there a command to get the MemAvailable value in /proc/meminfo? I don't see it at this point. collectl -smM waiting for 1 second sample... RECORD 1 >>> dubinx-openstack-elastic1 <<< (1501854119.001) (Fri Aug 4 14:41:59 2017) ### MEMORY SUMMARY <------------------------------------Physical Memory------------------------------------------><-----------Swap------------><-------Paging------> Total Used Free Buff Cached Slab Mapped Anon AnonH Commit Locked Inact Total Used Free In Out Fault MajFt...
I do have to apologize to my poor maintenance of the FAQ. The purpose of genplotfiles was to assist in converting raw files to plot files by running the command collectl -p raw-filename -P -f /var/log/collectl The problem with this strategy was as processors and other subsystems began showing up, I saw the raw files getting real large and the playback times growing to sometimes 1/2 or more. Being lazy, awhile back I came up with a much better plot file strategy, simply creating them on-the-fly! To...
I am trying to get started with colplot. The FAQ says: There is a utility that ships with colplot in the examples directory called genplotfiles.pl But neither in colplot-5.2.0.src.tar.gz nor in collectl-4.2.0.src.tar.gz I can locate an examples directory or some file with "genplotfiles" in its name anywhere. Therefore it would be great if you could give me a hint on where to find it. Regards fany
Thank you for pointing it out! I am embarrassed that even slipped by.. -mark On Tue, Jul 4, 2017 at 4:50 PM, John McCormack jmccormack@users.sf.net wrote: All looks good. Thanks a million for your help. Much appreciated. John Disk Stats Zero on RedHat 7 https://sourceforge.net/p/collectl/discussion/696864/thread/2b0e573a/?limit=25#c0b9 Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/collectl/discussion/696864/ To unsubscribe from further messages, please visit...
All looks good. Thanks a million for your help. Much appreciated. John
Great, Thanks for that Mark. I'll let you know how I get on with the new version. John
yes, check out latest release. should be ok now -mark On Jul 3, 2017 3:34 PM, "John McCormack" jmccormack@users.sf.net wrote: Hey Mark Just wondering if you ever got this issue resolved? Cheers John Disk Stats Zero on RedHat 7 https://sourceforge.net/p/collectl/discussion/696864/thread/2b0e573a/?limit=25#2903 Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/collectl/discussion/696864/ To unsubscribe from further messages, please visit https://sourceforge.net/...
Hey Mark Just wondering if you ever got this issue resolved? Cheers John
thanks for your understanding, but i really need to get this fixed sooner rather than later and am trying to figure out the best way to deal with the problem it's causing. I think I'm close. -mark On Fri, May 19, 2017 at 8:33 AM, John McCormack jmccormack@users.sf.net wrote: Cool, good work! No problems my end. I have a version that works for the moment... We are only intorducing this into our network now, so good to know there are guys like you there helping out. Thanks again for the help Mark,...
Cool, good work! No problems my end. I have a version that works for the moment... We are only intorducing this into our network now, so good to know there are guys like you there helping out. Thanks again for the help Mark, much appreciated!
ok, I found the problem and it looks like it IS something I introduced in the newer version, sorry about that. Sometimes it just takes longer to find the obvious. thanks for your patience with all this. stay tuned and I'll try to get a patch out a little later today. -mark On Thu, May 18, 2017 at 11:48 AM, John McCormack jmccormack@users.sf.net wrote: BTW, I first noticed this when I was runninng as a Deamon with this config DaemonCommands = -f /var/log/collectl/performance-tab -r00:00,0 -F1 -s+YZcCDN...
Thanks for the work Mark.. I know we are getting there... :-) So it looks like -sD works while -sd doesn't. These outputs were taken at the same time. collectl -sD waiting for 1 second sample... sdb Filter: Ignore: sda Filter: Ignore: dm-0 Filter: Ignore: dm-1 Filter: Ignore: sdc Filter: Ignore: sdd Filter: Ignore: sde Filter: Ignore: dm-2 Filter: Ignore: sdb Filter: Ignore: sda Filter: Ignore: dm-0 Filter: Ignore: dm-1 Filter: Ignore: sdc Filter: Ignore: sdd Filter: Ignore: sde Filter: Ignore: dm-2...
BTW, I first noticed this when I was runninng as a Deamon with this config DaemonCommands = -f /var/log/collectl/performance-tab -r00:00,0 -F1 -s+YZcCDN -oz -P --interval 10 My performance-tab-20170516.tab file was reporting zero, but the performance-tab-20170516.dsk was reporting fine.
Thanks for thew work Mark.. I know we are getting there... :-) So it looks like -sD works while -sd doesn't. These outputs were taken at the smae time. collectl -sD waiting for 1 second sample... sdb Filter: Ignore: sda Filter: Ignore: dm-0 Filter: Ignore: dm-1 Filter: Ignore: sdc Filter: Ignore: sdd Filter: Ignore: sde Filter: Ignore: dm-2 Filter: Ignore: sdb Filter: Ignore: sda Filter: Ignore: dm-0 Filter: Ignore: dm-1 Filter: Ignore: sdc Filter: Ignore: sdd Filter: Ignore: sde Filter: Ignore:...
one more thing to try. there is an optional switch to collectl called --dskfilt which it how you actually set some of those variable in my previous email. by default it's blank but can be overriden in collectl.conf. that swift can tell collectl which disks to keep or ignore. in any event, if you run collectl -d1, it will tell you the values of keep and ignore like this: DskFilt - Ignore: Keep: and if I run collectl with -d1 --dskfilt foo you'll see this: DskFilt - Ignore: Keep: foo which says only...
Looking more closely at the code, as soon as the stats are computed, which we now know is being done, a test is made to decide where or not to include those numbers in the summaries, which for you it looks like they're not! The quickest way to see what's going in is to insert this line of code around line 3388 in /usr/share/formatit.ph like this: # Apply filters to summary totals, explicitly ignoring those we don't want print "$diskName Filter: $dskFilt Ignore: $dskFiltIgnore\n"; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<...
this is good, we're making progress, whether you believe it or not ;) also means you can problem stop usng -d4 and remove the debugging code so clearly -sD is showing disk activity and -sd is not, so there must be something happening in the totals. As an example, here's a single sample from my system RECORD 1 >>> blkjak <<< (1495117501.001) (Thu May 18 10:25:01 2017) DISK SUMMARY (/sec) KBRead RMerged Reads SizeKB KBWrite WMerged Writes SizeKB 768 0 6 128 0 0 0 0 DISK STATISTICS (/sec) <---------reads---------------><---------writes--------------><--------averages-------->...
Write Test collectl <----CPU[HYPER]-----><----------Disks-----------><----------Network----------> cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 1296 1624 0 0 0 0 11 134 3 3 sdb proc:8717375 last: 8626245 writes: 45565 sda proc:2938390095 last: 2938390095 writes: 0 dm-0 proc:2938385224 last: 2938385224 writes: 0 dm-1 proc:0 last: 0 writes: 0 sdc proc:0 last: 0 writes: 0 sdd proc:0 last: 0 writes: 0 sde proc:0 last: 0 writes: 0 dm-2 proc:8717375 last: 8717373 writes: 1...
Here we go.. Going to split this into 2 posts.. So I performed a read test on sdd collectl -sdD -d4 waiting for 1 second sample... 1495114529.001 <<< disk 8 16 sdb 342 0 3605 285 214828 16786 2425790 15927 0 3594 16185 sdb proc:2425790 last: 0 writes: 1212895 disk 8 0 sda 7552 26 1443798 93280 69497415 2360514 2938281816 33866304 0 4649538 33926452 sda proc:2938281816 last: 0 writes: 1469140908 disk 253 0 dm-0 6151 0 418247 55581 71857882 0 2938276945 35414397 0 4700157 35903288 dm-0 proc:2938276945...
not sure what's going on with your last post as is showed up requiring approval, which I did but the reply bounced, so I'm replying to a previous message If I'm reading this correctly collectl IS working as advertised. I don't see a whole lot of disk activity but I do see some. Looking at the text from -sD -d4 I see this subset of output: disk 8 0 sda 6611 26 485958 57984 69393170 2356842 2925505656 31251165 0 4620929 31276109 sda proc:2925505656 last: 2925505508 writes: 74 sda 0 0 0 0 0 74 0 14...
Hey Mark Here are some results.. Hope you can read this collectl waiting for 1 second sample... sdb proc:2423354 last: 0 writes: 1211677 sda proc:2925448314 last: 0 writes: 1462724157 dm-0 proc:2925443443 last: 0 writes: 1462721721.5 dm-1 proc:0 last: 0 writes: 0 sdc proc:0 last: 0 writes: 0 sdd proc:0 last: 0 writes: 0 sde proc:0 last: 0 writes: 0 dm-2 proc:2423354 last: 0 writes: 1211677 sdb proc:2423354 last: 2423354 writes: 0 sda proc:2925450011 last: 2925448314 writes: 848.5 dm-0 proc:2925445140...
Hey Mark, Apologies for the delay..I'll update the the ticket soon
john - not to be a pest but I'll be away next week and would love to get this nailed down. Can you rerun with that line of debugging code I recommended? -mark On Wed, May 17, 2017 at 10:13 AM, Mark Seger markseger@users.sf.net wrote: you really need to put in that print statement, but when I did I realized it wasn't reporting what was in /proc/diskstats but rather just the values for the interval which is not right. Much better to do the following which includes the disk name as well as the value...
you really need to put in that print statement, but when I did I realized it wasn't reporting what was in /proc/diskstats but rather just the values for the interval which is not right. Much better to do the following which includes the disk name as well as the value from diskstats as well as the computed difference between the current and last sample. as an added bonus you can them grep on the disk name and see output like this: mjs@blkjak:/tmp/collectl-4.1.2$ ./collectl -sD -d4 | grep sda disk...
Hey Mark I just check my Ubuntu machine, and it looks like I am having the same issue. I was running an older version. WIth version 4.1.2 I have this disk issue. Anyway, this is a bit weird.. So there are changes on the sda device, but then the collectl -p shows nothing [root@geodb02 tmp]# zcat geodb02-20170517-090022.raw.gz | grep sda NumDisks: 8 DiskNames: sdb sda dm-0 dm-1 sdc sdd sde dm-2 disk 8 0 sda 6611 26 485958 57984 68187427 2306962 2874686782 30673938 0 4542030 30699418 disk 8 0 sda 6611...
so here's something more to look at, the actually math happens in /usr/share/collectl/formatit.ph if you look around line 3356, there's a chunk of code that looks like this, something I haven't changed in over 15 years, or at least not in recent memory: $dskRead[$dskIndex]= fix($dskFields[0]-$ dskFieldsLast[$dskIndex][0]); $dskReadMrg[$dskIndex]= fix($dskFields[1]-$ dskFieldsLast[$dskIndex][1]); $dskReadKB[$dskIndex]= fix($dskFields[2]-$ dskFieldsLast[$dskIndex][2])/2; $dskReadTicks[$dskIndex]= fix($dskFields[3]-$...
works fine on suse vm, but that may not mean anything. here's another thought, how about you collect 5 samples and save in a raw file like this: collectl -sd -c5 -f/tmp that zcat the raw file and confirm it has correctly records the diskstats, which I'm betting it did since -d4 prints the data right before writing it. next, try playing back the raw file and see if you see stats collectl -p /tmp/rawfilename since the logic is the same, in theory ;), it should also show zeros. And if so, send me a...
wow, I'm at a loss here and don't have a copy of redhat to try it on. to be clear you're saying it works ok on ubuntu, because that's what I'm running and it seems fine. I do have a suse vm and can try on it, but there is really odd. I'll also take a deeper look at the code and maybe spot something -mark On Wed, May 17, 2017 at 8:06 AM, John McCormack jmccormack@users.sf.net wrote: Hey Mark Thanks for the response.. I checked what you asked and everyting seems to be in order. I believe there is and...
Hey Mark Thanks for the response.. I checked what you asked and everyting seems to be in order. I believe there is and issue with the version. I downloaded collectl-4.0.4 and it started producing stats straight away. With collectl-4.0.4 collectl-4.0.4]# collectl waiting for 1 second sample... <--------CPU--------><----------Disks-----------><----------Network----------> cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 2 0 890 1289 0 0 0 0 24 179 4 40 8 0 1141 1466 0 0 72 4 37...
I'd suggest trying a couple of things, first and foremost look at /proc/diskstats and make sure your disks are listed there. Is it possible they're not local? Next try running iostat if you have it (part of the sysstat package I think). Collectl and iostat do virtually identical processing so if collectl isn't seeing disks I'd suspect iostat wouldn't be either. Furthermore you did think to run with -sD to show individual disks which would have been my next suggestion ;) Finally, if /proc/diskstats...
Hey All I am having an issue where collectl disk stats are showing as zero. Anyone any ideas what is going on here? I have an ubuntu server and all is fine.. 2 different hosts, 2 different versions of redhat, 2 different versions of collectl collectl-4.1.3 & collectl-4.1.2 [root@host]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 (Maipo) [root@host]# uname -r 3.10.0-514.el7.x86_64 [root@host ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.1 (Maipo) [root@host...
glad it worked out for you. I'd still prefer a pretty solution so maybe someday I'll...
Thanks Mark! This is what I have: nic0, nic1, nic2 collectl -sN does show the network...
I may have an answer and it's not a pretty one. ;( Seems like several years ago a...
I have read over this: http://collectl.sourceforge.net/NetworkStats.html, which covers...
I just tried sending to your SF email address and it bounced. feel free to send a...
0 out of 1 host connected is a bad thing, means it couldn't connect. may all be related...
funny you should mention that as I'm using the host file as well! turned out on my...
Yes, I can do that. Please send me a patched version. One other interesting thing...
good, but I'm still concerned why you were having problems. I just tried this and...
Hi Mark, As a hack, I replaced myreturnaddr with # my $myReturnAddr=($retaddr eq...
Still no luck - [root@mydbmachine cdr]# colmux -address mydbmachine -command "-sc"...
I'm thinking you may have stumbled on a different problem, sigh... I do wish more...
Hi Mark, I changed it and still could not get it work. [root@mydbmachine cdr]# grep...
I think I found it. For some reason your version of ifconfig is returning a line...
Hi Mark, Can you please let me know any way to fix this? KM
[root@se-r430-1 scripts]# ifconfig -a em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>...
good to hear, so the short answer is it works with --retaddr so you're at least ok...
With -retaddr works. [root@se-r430-1 scripts]# colmux -retaddr 10.0.14.94 -addr 10.0.14.94...
Hi Mark, client.pl runs fine. However colmux fails again. [root@se-r430-1 scripts]#...
so actually here is step 2, I had to try it out to make sure it would work. again,...
actually digging deeper into colmux I think I may know what's going on. The magic,...
This is a tough one and may take several iterations of q/a to get to the bottom of...
Debug mode - [root@se-r430-2 ~]# colmux -address localhost -debug 5 Command: '/usr/bin/ssh...
Collectl runs fine. Colmux starts up and I see TCP port 2655 open. Iptables is not...
Colmux errors out. [root@mydbmachine ~]# colmux -address localhost substr outside...
cool, I haven't seen this is years! collectl basically gets these stats from /proc/net/dev....
Hi, When I run the command line below, I get the output shown. The data is logged...
I only just started using statsd. I did notice a few repeater/relay/bridge statsd...
very cool stuff, I like it. My only personal hangup with statsd, and maybe you have...
Hello Mark and thank you for collectl, I had the need to send some disk device stats...
Thanks for the reply Mark. I was actually able to get the timestamp from the variable...
Thanks for the reply Mark. I was actually able to get the timestamp from the variable...
Thanks for the reply Mark. I was actually able to get the timestamp from the variable...
all very easy to do. the best thing is to steal some code from the other modules,...
Hello, I have created an export module that uses variables from the collectl/formatit.ph...
Hello, I have created an export module that uses variables from the collectl/formatit.ph...
ok thanks for confirming, now I need to figure out the best way to efficiently deal...
Hi Mark - sorry for never responding to your request. I got swept up by other things...
Nothing to add at the moment. Thanks. Also, I've explicitly added a public domain...
Nothing to add at the moment. Thanks. Also, I've explicitly added a public domain...
First of all, thank you again for your time and help. You are right, I don't know...
ok, lots going on. first and foremost I think you're running a version of colmux...
I have collectl running on every node of the cluster, the log generated is then to...
I would like to push a new release out soon, so if you have some words to add as...
Hi Mark, thanks for the response. That's a good point actually and I'm usually a...
a couple of things. are you only interested in generating data via colmux? if so,...
Hi, It's been too long (the project was on hold) thanks for your help. I collect...
happy to do so. the one thing I saw on the page you linked me to is a copyright by...
Patch adding e=escape graphite option