golly-test Mailing List for Golly

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

It was definitely a good idea to use SSE. Having sorted out the
denormals problem we got 1,000fps (up from 100fps). After moving the
code to SSE (with help from Robert Munafo) I'm getting 8,000 fps (on
an 8-core Intel i7 running Win7 64-bit). With OpenMP on top of SSE I
get 14,000 fps, the same speed as our OpenCL version (on an nVidia
GeForce 460). That's only a 1.8 speedup factor, so I'm hoping that
handling the threads ourselves will be faster.

The latest code is here:
http://code.google.com/p/reaction-diffusion/source/browse/trunk#trunk%2FSpeedComparisons

All suggestions welcome. I'm sure the OpenCL version can be improved,
for example!

On 19 September 2011 23:29, Tim Hutton <tim...@gm...> wrote:
> None of this is in Golly yet. I'm using my reaction-diffusion project
> as a testbed. And to avoid complicating that project too much I've now
> moved the OpenCL stuff and the other comparisons into a separate
> SpeedComparisons folder.
>
> I'm having a look at some SSE tutorials. It's good for me to learn
> this stuff anyway of course. Some of my current questions: Is SSE
> Intel-specific? Is SSE2 anything to worry about? How does SSE relate
> to all the other on-chip stuff like MMX? How well supported in SSE on
> different operating systems?
>
> On 19 September 2011 17:41, Tom Rokicki <ro...@gm...> wrote:
>> Well, frankly, we should be using SSE anyway.  I'm sorry I didn't get time
>> to spend on it this weekend, but in general using SSE is not bad at all.
>> You set up your data correctly, and normally the compiler will swoop in
>> and vectorize.  If that doesn't work there are intrinsics.
>>
>> Even with SSE you need to deal with denormals.
>>
>> There's a control bit you can set to tell it to flush denormals to zero.
>>
>> I might get some time for this soon; I'm not sure.  Frankly, if I were to
>> experiment with this, I'd pull it out of Golly and put it in a tiny
>> standalone program to experiment with . . .
>>
>> On Mon, Sep 19, 2011 at 9:34 AM, Tim Hutton <tim...@gm...> wrote:
>>> Is there a way to avoid the problem, do you know? I'm guessing doubles
>>> have the same problem, so we've only reduced the problem, not avoided
>>> it completely.
>>>
>>> Should I manually round small values to zero?
>>>
>>> On 19 September 2011 16:28, Tom Rokicki <ro...@gm...> wrote:
>>>> I was going to suggest the issue is denormals.  I think it's pretty probable.
>>>>
>>>> On Mon, Sep 19, 2011 at 7:37 AM, Tim Hutton <tim...@gm...> wrote:
>>>>> On 18 September 2011 00:38, Tim Hutton <tim...@gm...> wrote:
>>>>>> On 17 September 2011 23:24, Andrew Trevorrow <an...@tr...> wrote:
>>>>>>>>> Single core: 1,600 fps
>>>>>>>>> OpenCL on AMD Radeon HD 6970M: 140,000 fps
>>>>>>>>
>>>>>>>> Wow that's a *lot* faster in the single-core case than on my machine.
>>>>>>>> How interesting.
>>>>>>>
>>>>>>> The figure of 1,600 fps is after the spots have filled the window.
>>>>>>> When the single core version first starts up I see fps rates of about 450.
>>>>>>> The rate then gradually increases as the spots fill the window.
>>>>>>> Is that what you see?
>>>>>>
>>>>>> Yes, I do. My reported figure was from near the beginning, so I will
>>>>>> have another look at that.
>>>>>
>>>>> Just left it running for a bit and I get 1100 fps on a single core.
>>>>> But it's under 100 fps at the slowest point.
>>>>> (This is on an Intel i7-2600, rated at 3.4 GHz.)
>>>>>
>>>>> The speed seems to be related to the content of the scene. Change the
>>>>> if-statement in init() to read:
>>>>>
>>>>> if(hypot(i%50-25,j%50-25)<=frand(2,5))
>>>>>
>>>>> (this puts lots of dots into the starting frame instead of one)
>>>>>
>>>>> On my machine this immediately hits 1100 fps and stays there. With
>>>>> OpenMP I now get 5000 fps. Still only 14000 fps in OpenCL though.
>>>>>
>>>>> Maybe we are hitting the problem of 'denormals' - floating point underflow.
>>>>> http://www.cygnus-software.com/papers/x86andinfinity.html
>>>>> But Visual Studio 9 Pro is saying "D9002 : ignoring unknown option
>>>>> '/Qftz'" so maybe there's a reason it's not supported.
>>>>> Someone here suggested using doubles instead of floats:
>>>>> http://forums.nvidia.com/index.php?s=115f66f09e4dcc7e6771d8a06ff57246&showtopic=188850&view=findpost&p=1173133
>>>>> I don't understand why that causes a value-dependent slowdown but
>>>>> changing to doubles throughout does greatly improve things in my
>>>>> build. And they suggest that using SSE avoids this problem too. I'm
>>>>> now going to try to understand what SSE is.
>>>>>
>>>>>
>>>>> --
>>>>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
>>>>> Learn about the latest advances in developing for the
>>>>> BlackBerry&reg; mobile platform with sessions, labs & more.
>>>>> See new tools and technologies. Register for BlackBerry&reg; DevCon today!
>>>>> http://p.sf.net/sfu/rim-devcon-copy1
>>>>> _______________________________________________
>>>>> Golly-test mailing list
>>>>> Gol...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/golly-test
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --  http://cube20.org/  --  http://golly.sf.net/  --
>>>>
>>>> ------------------------------------------------------------------------------
>>>> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
>>>> Learn about the latest advances in developing for the
>>>> BlackBerry&reg; mobile platform with sessions, labs & more.
>>>> See new tools and technologies. Register for BlackBerry&reg; DevCon today!
>>>> http://p.sf.net/sfu/rim-devcon-copy1
>>>> _______________________________________________
>>>> Golly-test mailing list
>>>> Gol...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/golly-test
>>>>
>>>
>>>
>>>
>>> --
>>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/
>>>
>>> ------------------------------------------------------------------------------
>>> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
>>> Learn about the latest advances in developing for the
>>> BlackBerry&reg; mobile platform with sessions, labs & more.
>>> See new tools and technologies. Register for BlackBerry&reg; DevCon today!
>>> http://p.sf.net/sfu/rim-devcon-copy1
>>> _______________________________________________
>>> Golly-test mailing list
>>> Gol...@li...
>>> https://lists.sourceforge.net/lists/listinfo/golly-test
>>>
>>
>>
>>
>> --
>> --  http://cube20.org/  --  http://golly.sf.net/  --
>>
>> ------------------------------------------------------------------------------
>> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
>> Learn about the latest advances in developing for the
>> BlackBerry&reg; mobile platform with sessions, labs & more.
>> See new tools and technologies. Register for BlackBerry&reg; DevCon today!
>> http://p.sf.net/sfu/rim-devcon-copy1
>> _______________________________________________
>> Golly-test mailing list
>> Gol...@li...
>> https://lists.sourceforge.net/lists/listinfo/golly-test
>>
>
>
>
> --
> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/
>

-- 
Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/

2005	Jan	Feb	Mar	Apr	May	Jun	Jul (15)	Aug	Sep (72)	Oct (34)	Nov (10)	Dec (20)
2006	Jan	Feb (22)	Mar (9)	Apr (11)	May (18)	Jun (68)	Jul (10)	Aug (4)	Sep (13)	Oct (29)	Nov (21)	Dec (24)
2007	Jan (32)	Feb (19)	Mar (11)	Apr (14)	May (8)	Jun (7)	Jul (3)	Aug	Sep	Oct (8)	Nov (26)	Dec (16)
2008	Jan (1)	Feb (4)	Mar (4)	Apr (25)	May (23)	Jun (22)	Jul (18)	Aug (61)	Sep (129)	Oct (106)	Nov (99)	Dec (24)
2009	Jan (6)	Feb (2)	Mar (29)	Apr (84)	May (106)	Jun (70)	Jul (56)	Aug (42)	Sep (62)	Oct (140)	Nov (38)	Dec (9)
2010	Jan (19)	Feb (15)	Mar (32)	Apr (36)	May (28)	Jun (17)	Jul (12)	Aug (13)	Sep (7)	Oct (9)	Nov (156)	Dec (56)
2011	Jan (53)	Feb (25)	Mar (6)	Apr	May (1)	Jun (22)	Jul (8)	Aug (20)	Sep (50)	Oct (60)	Nov (44)	Dec (3)
2012	Jan (2)	Feb (11)	Mar (32)	Apr (35)	May (13)	Jun (90)	Jul (15)	Aug (27)	Sep (15)	Oct (28)	Nov	Dec
2013	Jan	Feb (119)	Mar (91)	Apr (68)	May (29)	Jun (24)	Jul (4)	Aug (14)	Sep (3)	Oct (11)	Nov (31)	Dec (36)
2014	Jan (48)	Feb (1)	Mar (23)	Apr (14)	May (15)	Jun (4)	Jul (8)	Aug (18)	Sep	Oct (14)	Nov	Dec (5)
2015	Jan (2)	Feb	Mar (11)	Apr (3)	May (44)	Jun (14)	Jul (7)	Aug (2)	Sep (5)	Oct (23)	Nov (27)	Dec (7)
2016	Jan (15)	Feb (22)	Mar (23)	Apr (41)	May (25)	Jun (1)	Jul (27)	Aug (9)	Sep (5)	Oct	Nov (27)	Dec
2017	Jan	Feb	Mar (3)	Apr (2)	May (1)	Jun (18)	Jul (16)	Aug (11)	Sep	Oct (3)	Nov	Dec
2018	Jan (11)	Feb (2)	Mar (3)	Apr	May (13)	Jun (12)	Jul (16)	Aug	Sep	Oct (1)	Nov	Dec
2019	Jan	Feb (3)	Mar (21)	Apr (8)	May (12)	Jun	Jul	Aug (4)	Sep (4)	Oct (2)	Nov (5)	Dec (16)
2020	Jan	Feb	Mar (1)	Apr (2)	May (16)	Jun	Jul (10)	Aug (24)	Sep (31)	Oct (17)	Nov (4)	Dec
2021	Jan (3)	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (12)	Dec (10)
2022	Jan	Feb (3)	Mar (2)	Apr (15)	May (4)	Jun	Jul	Aug (15)	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar	Apr (3)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb	Mar (1)	Apr (6)	May (1)	Jun	Jul (1)	Aug (3)	Sep	Oct	Nov	Dec (1)
2025	Jan (1)	Feb	Mar (1)	Apr	May	Jun (1)	Jul (3)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
				1	2	3
4	5 (1)	6	7	8 (1)	9	10
11 (1)	12 (1)	13 (1)	14	15 (1)	16 (5)	17 (7)
18 (1)	19 (9)	20	21 (5)	22 (7)	23 (2)	24 (3)
25 (1)	26	27	28 (3)	29 (1)	30

golly-test Mailing List for Golly

For exploring cellular automata like Conway's Game of Life.

golly-test — Mail list for testers of golly