|
From: Hernan L. <her...@gm...> - 2016-06-23 02:38:06
|
Hello Mark,
I downloaded the latest version from Sourceforge and it seems to fix these
issues, even with RAW files generated by the (older?) version available on
Debian. I will use this version going forward, we can declare the problem
resolved.
Thanks for your help,
Hernan
On Thu, Jun 16, 2016 at 5:30 AM, Mark Seger <mj...@gm...> wrote:
> Wow, that's a tricky one. quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either. It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using. ;) That allowed me to essentially
> use the same command you are, with one note. I also added -test and see
> columns 10 and 20 are different than you're saying. maybe you have a
> different kernel? I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
> [CPU:0]Idle% [CPU:1]Soft%
> #Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
> 12:08:27 -1 -1 -1 | -1 -1 -1
> 12:08:28 -1 -1 -1 | -1 -1 -1
> 12:08:29 95 -1 100 | 0 -1 0
> 12:08:30 95 97 98 | 0 0 0
> 12:08:31 97 100 100 | 0 0 0
> 12:08:32 87 100 89 | 0 0 0
> 12:08:33 100 100 100 | 0 0 0
> 12:08:34 100 100 99 | 0 0 0
> 12:08:35 100 97 97 | 0 0 0
> 12:08:36 99 98 100 | 0 0 0
>
> What you didn't say is does this fail all the time or intermittently. If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet? If not, you can
> simply rerun the command but include -p and point it to the raw files. The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle. And then to make matters
> worse I found a second bug and am using the wrong column during playback.
> more digging into that required too. ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
> [CPU:0]Totl% [CPU:1]Steal%
> #Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
> 99 99 100 | 0 0 0
> 98 99 97 | 0 0 0
> 94 98 94 | 0 0 0
> 94 93 92 | 0 0 0
> 99 94 98 | 0 0 0
> 99 100 99 | 0 0 0
> 99 100 100 | 0 0 0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one]. then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> her...@gm...> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>> collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>> colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>> Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>> 'lasttime' => [
>> '',
>> '20160615'
>> ],
>> 'maxinst' => [
>> -1,
>> 0
>> ],
>> 'lastinst' => [
>> -1,
>> 0
>> ],
>> 'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> Col...@li...
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>
|