You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(15) |
Aug
|
Sep
(72) |
Oct
(34) |
Nov
(10) |
Dec
(20) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
|
Feb
(22) |
Mar
(9) |
Apr
(11) |
May
(18) |
Jun
(68) |
Jul
(10) |
Aug
(4) |
Sep
(13) |
Oct
(29) |
Nov
(21) |
Dec
(24) |
2007 |
Jan
(32) |
Feb
(19) |
Mar
(11) |
Apr
(14) |
May
(8) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
|
Oct
(8) |
Nov
(26) |
Dec
(16) |
2008 |
Jan
(1) |
Feb
(4) |
Mar
(4) |
Apr
(25) |
May
(23) |
Jun
(22) |
Jul
(18) |
Aug
(61) |
Sep
(129) |
Oct
(106) |
Nov
(99) |
Dec
(24) |
2009 |
Jan
(6) |
Feb
(2) |
Mar
(29) |
Apr
(84) |
May
(106) |
Jun
(70) |
Jul
(56) |
Aug
(42) |
Sep
(62) |
Oct
(140) |
Nov
(38) |
Dec
(9) |
2010 |
Jan
(19) |
Feb
(15) |
Mar
(32) |
Apr
(36) |
May
(28) |
Jun
(17) |
Jul
(12) |
Aug
(13) |
Sep
(7) |
Oct
(9) |
Nov
(156) |
Dec
(56) |
2011 |
Jan
(53) |
Feb
(25) |
Mar
(6) |
Apr
|
May
(1) |
Jun
(22) |
Jul
(8) |
Aug
(20) |
Sep
(50) |
Oct
(60) |
Nov
(44) |
Dec
(3) |
2012 |
Jan
(2) |
Feb
(11) |
Mar
(32) |
Apr
(35) |
May
(13) |
Jun
(90) |
Jul
(15) |
Aug
(27) |
Sep
(15) |
Oct
(28) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
(119) |
Mar
(91) |
Apr
(68) |
May
(29) |
Jun
(24) |
Jul
(4) |
Aug
(14) |
Sep
(3) |
Oct
(11) |
Nov
(31) |
Dec
(36) |
2014 |
Jan
(48) |
Feb
(1) |
Mar
(23) |
Apr
(14) |
May
(15) |
Jun
(4) |
Jul
(8) |
Aug
(18) |
Sep
|
Oct
(14) |
Nov
|
Dec
(5) |
2015 |
Jan
(2) |
Feb
|
Mar
(11) |
Apr
(3) |
May
(44) |
Jun
(14) |
Jul
(7) |
Aug
(2) |
Sep
(5) |
Oct
(23) |
Nov
(27) |
Dec
(7) |
2016 |
Jan
(15) |
Feb
(22) |
Mar
(23) |
Apr
(41) |
May
(25) |
Jun
(1) |
Jul
(27) |
Aug
(9) |
Sep
(5) |
Oct
|
Nov
(27) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
(3) |
Apr
(2) |
May
(1) |
Jun
(18) |
Jul
(16) |
Aug
(11) |
Sep
|
Oct
(3) |
Nov
|
Dec
|
2018 |
Jan
(11) |
Feb
(2) |
Mar
(3) |
Apr
|
May
(13) |
Jun
(12) |
Jul
(16) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
(3) |
Mar
(21) |
Apr
(8) |
May
(12) |
Jun
|
Jul
|
Aug
(4) |
Sep
(4) |
Oct
(2) |
Nov
(5) |
Dec
(16) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(2) |
May
(16) |
Jun
|
Jul
(10) |
Aug
(24) |
Sep
(31) |
Oct
(17) |
Nov
(4) |
Dec
|
2021 |
Jan
(3) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(12) |
Dec
(10) |
2022 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
(15) |
May
(4) |
Jun
|
Jul
|
Aug
(15) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
(1) |
Apr
(6) |
May
(1) |
Jun
|
Jul
(1) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2025 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
1
|
2
|
3
|
4
|
5
(1) |
6
|
7
|
8
(1) |
9
|
10
|
11
(1) |
12
(1) |
13
(1) |
14
|
15
(1) |
16
(5) |
17
(7) |
18
(1) |
19
(9) |
20
|
21
(5) |
22
(7) |
23
(2) |
24
(3) |
25
(1) |
26
|
27
|
28
(3) |
29
(1) |
30
|
|
From: Tim H. <tim...@gm...> - 2011-09-29 10:33:08
|
It was definitely a good idea to use SSE. Having sorted out the denormals problem we got 1,000fps (up from 100fps). After moving the code to SSE (with help from Robert Munafo) I'm getting 8,000 fps (on an 8-core Intel i7 running Win7 64-bit). With OpenMP on top of SSE I get 14,000 fps, the same speed as our OpenCL version (on an nVidia GeForce 460). That's only a 1.8 speedup factor, so I'm hoping that handling the threads ourselves will be faster. The latest code is here: http://code.google.com/p/reaction-diffusion/source/browse/trunk#trunk%2FSpeedComparisons All suggestions welcome. I'm sure the OpenCL version can be improved, for example! On 19 September 2011 23:29, Tim Hutton <tim...@gm...> wrote: > None of this is in Golly yet. I'm using my reaction-diffusion project > as a testbed. And to avoid complicating that project too much I've now > moved the OpenCL stuff and the other comparisons into a separate > SpeedComparisons folder. > > I'm having a look at some SSE tutorials. It's good for me to learn > this stuff anyway of course. Some of my current questions: Is SSE > Intel-specific? Is SSE2 anything to worry about? How does SSE relate > to all the other on-chip stuff like MMX? How well supported in SSE on > different operating systems? > > On 19 September 2011 17:41, Tom Rokicki <ro...@gm...> wrote: >> Well, frankly, we should be using SSE anyway. I'm sorry I didn't get time >> to spend on it this weekend, but in general using SSE is not bad at all. >> You set up your data correctly, and normally the compiler will swoop in >> and vectorize. If that doesn't work there are intrinsics. >> >> Even with SSE you need to deal with denormals. >> >> There's a control bit you can set to tell it to flush denormals to zero. >> >> I might get some time for this soon; I'm not sure. Frankly, if I were to >> experiment with this, I'd pull it out of Golly and put it in a tiny >> standalone program to experiment with . . . >> >> On Mon, Sep 19, 2011 at 9:34 AM, Tim Hutton <tim...@gm...> wrote: >>> Is there a way to avoid the problem, do you know? I'm guessing doubles >>> have the same problem, so we've only reduced the problem, not avoided >>> it completely. >>> >>> Should I manually round small values to zero? >>> >>> On 19 September 2011 16:28, Tom Rokicki <ro...@gm...> wrote: >>>> I was going to suggest the issue is denormals. I think it's pretty probable. >>>> >>>> On Mon, Sep 19, 2011 at 7:37 AM, Tim Hutton <tim...@gm...> wrote: >>>>> On 18 September 2011 00:38, Tim Hutton <tim...@gm...> wrote: >>>>>> On 17 September 2011 23:24, Andrew Trevorrow <an...@tr...> wrote: >>>>>>>>> Single core: 1,600 fps >>>>>>>>> OpenCL on AMD Radeon HD 6970M: 140,000 fps >>>>>>>> >>>>>>>> Wow that's a *lot* faster in the single-core case than on my machine. >>>>>>>> How interesting. >>>>>>> >>>>>>> The figure of 1,600 fps is after the spots have filled the window. >>>>>>> When the single core version first starts up I see fps rates of about 450. >>>>>>> The rate then gradually increases as the spots fill the window. >>>>>>> Is that what you see? >>>>>> >>>>>> Yes, I do. My reported figure was from near the beginning, so I will >>>>>> have another look at that. >>>>> >>>>> Just left it running for a bit and I get 1100 fps on a single core. >>>>> But it's under 100 fps at the slowest point. >>>>> (This is on an Intel i7-2600, rated at 3.4 GHz.) >>>>> >>>>> The speed seems to be related to the content of the scene. Change the >>>>> if-statement in init() to read: >>>>> >>>>> if(hypot(i%50-25,j%50-25)<=frand(2,5)) >>>>> >>>>> (this puts lots of dots into the starting frame instead of one) >>>>> >>>>> On my machine this immediately hits 1100 fps and stays there. With >>>>> OpenMP I now get 5000 fps. Still only 14000 fps in OpenCL though. >>>>> >>>>> Maybe we are hitting the problem of 'denormals' - floating point underflow. >>>>> http://www.cygnus-software.com/papers/x86andinfinity.html >>>>> But Visual Studio 9 Pro is saying "D9002 : ignoring unknown option >>>>> '/Qftz'" so maybe there's a reason it's not supported. >>>>> Someone here suggested using doubles instead of floats: >>>>> http://forums.nvidia.com/index.php?s=115f66f09e4dcc7e6771d8a06ff57246&showtopic=188850&view=findpost&p=1173133 >>>>> I don't understand why that causes a value-dependent slowdown but >>>>> changing to doubles throughout does greatly improve things in my >>>>> build. And they suggest that using SSE avoids this problem too. I'm >>>>> now going to try to understand what SSE is. >>>>> >>>>> >>>>> -- >>>>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>>>> Learn about the latest advances in developing for the >>>>> BlackBerry® mobile platform with sessions, labs & more. >>>>> See new tools and technologies. Register for BlackBerry® DevCon today! >>>>> http://p.sf.net/sfu/rim-devcon-copy1 >>>>> _______________________________________________ >>>>> Golly-test mailing list >>>>> Gol...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/golly-test >>>>> >>>> >>>> >>>> >>>> -- >>>> -- http://cube20.org/ -- http://golly.sf.net/ -- >>>> >>>> ------------------------------------------------------------------------------ >>>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>>> Learn about the latest advances in developing for the >>>> BlackBerry® mobile platform with sessions, labs & more. >>>> See new tools and technologies. Register for BlackBerry® DevCon today! >>>> http://p.sf.net/sfu/rim-devcon-copy1 >>>> _______________________________________________ >>>> Golly-test mailing list >>>> Gol...@li... >>>> https://lists.sourceforge.net/lists/listinfo/golly-test >>>> >>> >>> >>> >>> -- >>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >>> >>> ------------------------------------------------------------------------------ >>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>> Learn about the latest advances in developing for the >>> BlackBerry® mobile platform with sessions, labs & more. >>> See new tools and technologies. Register for BlackBerry® DevCon today! >>> http://p.sf.net/sfu/rim-devcon-copy1 >>> _______________________________________________ >>> Golly-test mailing list >>> Gol...@li... >>> https://lists.sourceforge.net/lists/listinfo/golly-test >>> >> >> >> >> -- >> -- http://cube20.org/ -- http://golly.sf.net/ -- >> >> ------------------------------------------------------------------------------ >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >> Learn about the latest advances in developing for the >> BlackBerry® mobile platform with sessions, labs & more. >> See new tools and technologies. Register for BlackBerry® DevCon today! >> http://p.sf.net/sfu/rim-devcon-copy1 >> _______________________________________________ >> Golly-test mailing list >> Gol...@li... >> https://lists.sourceforge.net/lists/listinfo/golly-test >> > > > > -- > Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ > -- Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ |
From: Robert M. <mr...@gm...> - 2011-09-28 10:16:15
|
I've been corresponding with Andrew Trevorrow and Tim Hutton about this. Here are some images that should look hauntingly familiar: http://mrob.com/pub/comp/xmorphia/catalog.html Those are some of the still-lifes, rotaters and spaceships from the Gray-Scott system at F=0.062, k=0.0609. Movies here: http://mrob.com/pub/comp/xmorphia/uskate-world.html I gave a talk on this last year at Rutgers, and my website has a lot more. On the rulesets issue, my advice is to support formulas that can be compiled just-in-time either using OpenCL or with something like Clang/LLVM for a CPU-based solution. I guess we could use Java too, some applets I've seen seem pretty efficient. The formula for Gray-Scott is: u: Du*del(u) - u*v*v + F(1-u) v: Dv*del(v) + u*v*v - (F+k)*v where Du,Dv are global constants specific to Gray-Scott and F,k are parameters adjustable by the user wishing to explore the possibilities of the system. del() is a function we should provide, typical implementations use a 5-cell Von Neumann neighborhood to approximate the Laplacian (derivative of the gradient, i.e. diffusion flux function), and there is a 9-cell Moore neighborhood which does a better job giving fewer grid-aligned artifacts. From: Alan Tennant > Lets make something as simple, generic and powerfull as rule tables, or > somehow use rule tables. > > -- Robert Munafo -- mrob.com Follow me at: gplus.to/mrob - fb.com/mrob27 - twitter.com/mrob_27 - mrob27.wordpress.com - youtube.com/user/mrob143 - rilybot.blogspot.com |
From: Andrew T. <an...@tr...> - 2011-09-28 08:20:17
|
Hi Robert (and welcome): > So how do I find out if my build errors on a Mac have been seen before? Ask here. :) There aren't too many people who build Golly on a Mac. I may well be the only person (until now!). You've been a bit unlucky to be trying this just now because I'm in the middle (actually near the end) of switching from a 32-bit Carbon build to a 64-bit Cocoa build, so the BUILD docs and makefile-mac are still in a state of flux. > I just did port selfupdate on this Sandy Bridge MacBook Pro, and > subsequently installed CMake (last week) and wxWidgets (last night), > before trying to build Golly. Nevertheless, the CMake build > instructions result in (about 60% of the way through the build): > > ~/devt/golly/golly/src/wxfile.cpp: In member function 'void > MainFrame::MySetTitle(const wxString&)': > ~/devt/golly/golly/src/wxfile.cpp:97: error: 'SetWindowTitleWithCFString' > was not declared in this scope Ok, that means your wxWidgets installation is a 32-bit Carbon build (probably version 2.8.12). I think you can persuade cmake to build a 32-bit Golly by doing this: cmake -DCMAKE_OSX_ARCHITECTURES=i386 .. But I would recommend building a 64-bit Cocoa version of wxWidgets. To do this it is best to get the latest wx sources via: svn checkout http://svn.wxwidgets.org/svn/wx/wxWidgets/trunk wx-trunk Then do these steps: cd wx-trunk mkdir build-osx cd build-osx ../configure --enable-unicode --disable-shared --with-osx_cocoa make (takes a while!) sudo make install Building a 64-bit Cocoa version of Golly should then be easy: cd /path/to/golly/src mkdir cmakedir (use any name for this dir) cd cmakedir cmake .. make Let me know how that goes. Andrew |
From: Robert M. <mr...@gm...> - 2011-09-28 06:15:55
|
I have been working with Tim Hutton on feasibilitiy studies for doing reaction-diffusion systems (our work can be seen on code.google.com/p/reaction-diffusion and mine is in the SpeedCOmparisons subdirectory) I noticed that the archives for this list are not searchable. So how do I find out if my build errors on a Mac have been seen before? You can skip the rest if you aren't interested. - - - I got golly via cvs with cvs -z3 -d:pserver:ano...@go...:/cvsroot/golly co -P golly I just did port selfupdate on this Sandy Bridge MacBook Pro, and subsequently installed CMake (last week) and wxWidgets (last night), before trying to build Golly. Nevertheless, the CMake build instructions result in (about 60% of the way through the build): ~/devt/golly/golly/src/wxfile.cpp: In member function 'void MainFrame::MySetTitle(const wxString&)': ~/devt/golly/golly/src/wxfile.cpp:97: error: 'SetWindowTitleWithCFString' was not declared in this scope I then tried the autogen.sh/configure instructions, which resulted in a really amusing complaint: configure: error: missing objdump Which is funny because Macs haven't had objdump since MacOS 10.3. So I don't think anyone has tried those instructions on a Mac in quite a while. Anyway, I also looked at configuring the makefile-mac myself. wxWidgets installs a tool called wx-config (which is in my path by default, like everything else MacPorts installs) and wx-config gave me the info I needed to get makefile-mac working far enough to produce the same "SetWindowTitleWithCFString" error. So no Golly debugging for me, I guess (-: - Robert -- Robert Munafo -- mrob.com Follow me at: gplus.to/mrob - fb.com/mrob27 - twitter.com/mrob_27 - mrob27.wordpress.com - youtube.com/user/mrob143 - rilybot.blogspot.com |
From: Alan T. <ala...@gm...> - 2011-09-25 18:04:18
|
Lets make something as simple, generic and powerfull as rule tables, or somehow use rule tables. On 8 August 2011 13:25, Tim Hutton <tim...@gm...> wrote: > On 8 August 2011 12:18, Tim Hutton <tim...@gm...> wrote: > > Typically a toroidal topology is used but most of the time it wouldn't > > be a problem to impose a fixed boundary condition and run it in a > > finite area. Exceptions: the EOE examples with a spreading stripe, > > complex Ginsberg-Landau - we'll struggle to make these work without a > > wrap-around topology, I think. > > Oh, I forgot. Another way is to treat the edge cells as if they're > their own neighbors. When the 'toroidal' flag is set to false I do > that here: > > https://code.google.com/p/reaction-diffusion/source/browse/trunk/GrayScott/gray_scott.cpp#133 > This is probably our easiest option, it seems to work fine on all cases. > > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > The must-attend event for mobile developers. Connect with experts. > Get tools for creating Super Apps. See the latest technologies. > Sessions, hands-on labs, demos & much more. Register early & save! > http://p.sf.net/sfu/rim-blackberry-1 > _______________________________________________ > Golly-test mailing list > Gol...@li... > https://lists.sourceforge.net/lists/listinfo/golly-test > |
From: Andrew T. <an...@tr...> - 2011-09-24 23:57:51
|
Maks: > Good to hear! How did you time this? Is there a benchmark mode that > I'm not aware of? If you hit shift-T (or assign whatever shortcut you like for "Show Timing" in Prefs > Keyboard) then Golly displays the gens per sec value for the most recent pattern . So I just set the viewport to a fixed size (say 50x50 cells at 1:16) on each platform and run rabbits.lif for about 10 secs, stop and hit shift-T. > By the way, I think the speedup depends on the size of the viewport and > the algorithm used; QuickLife in particular creates relatively large > bitmaps, so there is more to be gained from cropping them to the viewport. It would be nice to create a benchmark script that would set everything up, do the runs and spit out the results (in the clipboard or help window?). One thing we're missing though is a script command that sets the viewport to a specified size (in pixels), then our script could call setview(800,800). If we also implement wd,ht = getview() then the script could restore the original size when it finishes. Andrew |
From: Maks V. <mak...@ge...> - 2011-09-24 21:54:45
|
On 09/23/2011 11:18 PM, Tom Rokicki wrote: > We should definitely time it, though. Looks like Andrew beat me to it: On 09/24/2011 01:26 PM, Andrew Trevorrow wrote: > On my Linux system Golly now renders 350% faster at scale 1:16 > I also saw improvements on my Mac (60% faster) and Windows (17% > faster). Good to hear! How did you time this? Is there a benchmark mode that I'm not aware of? (I know bgolly has some options, but that doesn't help with the GUI stuff.) By the way, I think the speedup depends on the size of the viewport and the algorithm used; QuickLife in particular creates relatively large bitmaps, so there is more to be gained from cropping them to the viewport. - Maks. |
From: Andrew T. <an...@tr...> - 2011-09-24 11:26:39
|
Maks: > One of the things that always bothered me about Golly on Linux is that > rendering is very slow. To improve this, I modified the pixblit() code > of the wx_render class so that it clips the pixmap to be rendered > *before* creating the wxBitmap that is written to, because rendering > large bitmaps is apparently very slow with GTK/X. Much thanks! On my Linux system Golly now renders 350% faster at scale 1:16 I also saw improvements on my Mac (60% faster) and Windows (17% faster). I wouldn't spend too much time trying to eke out further improvement. Ultimately I think we'll be better off abandoning wx's DrawBitmap stuff and switching to OpenGL. I know I've been threatening that for years, but once 2.3 is out the door (hopefully before the end of the year) I plan to start playing with OpenGL. If anybody wants to beat me to it, feel free! Andrew |
From: Tom R. <ro...@gm...> - 2011-09-23 21:18:35
|
I think this may actually also benefit other platforms. I certainly don't have anything against this. We should definitely time it, though. On Fri, Sep 23, 2011 at 2:06 PM, Maks Verver <mak...@ge...> wrote: > Hi everyone, > > One of the things that always bothered me about Golly on Linux is that > rendering is very slow. To improve this, I modified the pixblit() code > of the wx_render class so that it clips the pixmap to be rendered > *before* creating the wxBitmap that is written to, because rendering > large bitmaps is apparently very slow with GTK/X. > > For me, zoomed rendering still isn't nearly as fast as unzoomed yet, but > it is noticeably quicker. An important benefit of the clipping code is > that making Golly's window smaller actually speeds up rendering, which > it didn't before, and similarly, the rendering time isn't multiplied by > the number of visible (tiled) layers any more, which makes that feature > a lot more usable. > > I think this shouldn't cause problems on other platforms, but you may > want to watch out for them nonetheless. The only downside I could think > of is that pixblit() reallocates the bitmap buffer more often because > clipping creates many differently sized bitmaps. On Linux that doesn't > seem problematic (the benefits outweigh the costs), but if this degrades > performance on other platforms, we should think of a way to fix this. > (Disabling the clipping code on those platforms entirely is always > possible, but we can probably do better.) > > - Maks. > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > Golly-test mailing list > Gol...@li... > https://lists.sourceforge.net/lists/listinfo/golly-test > -- -- http://cube20.org/ -- http://golly.sf.net/ -- |
From: Maks V. <mak...@ge...> - 2011-09-23 21:06:58
|
Hi everyone, One of the things that always bothered me about Golly on Linux is that rendering is very slow. To improve this, I modified the pixblit() code of the wx_render class so that it clips the pixmap to be rendered *before* creating the wxBitmap that is written to, because rendering large bitmaps is apparently very slow with GTK/X. For me, zoomed rendering still isn't nearly as fast as unzoomed yet, but it is noticeably quicker. An important benefit of the clipping code is that making Golly's window smaller actually speeds up rendering, which it didn't before, and similarly, the rendering time isn't multiplied by the number of visible (tiled) layers any more, which makes that feature a lot more usable. I think this shouldn't cause problems on other platforms, but you may want to watch out for them nonetheless. The only downside I could think of is that pixblit() reallocates the bitmap buffer more often because clipping creates many differently sized bitmaps. On Linux that doesn't seem problematic (the benefits outweigh the costs), but if this degrades performance on other platforms, we should think of a way to fix this. (Disabling the clipping code on those platforms entirely is always possible, but we can probably do better.) - Maks. |
From: Andrew T. <an...@tr...> - 2011-09-22 08:45:01
|
Maks: > Maybe old versions handled an argument of NULL differently, except that > if I test this on a system with Perl 5.10: > > perl -e 'use Carp; eval "croak"; print "($@)\n"' > > It prints the " at $file.." stuff too. So I don't see how it was ever > possible to terminate a script with croak() and not have it result in > g_fatal() being called with some non-empty error message. Presumably embedded Perl behaves a bit differently. I know very little about Perl's innards! >> Anyway, I think the safest fix is to add a check in pl_fatal: >> [..] >> Let me know if that works. > > That works fine. Should I commit this? Yes please. > (It still bugs me on principle that I don't understand why this change > is necessary!) I've learnt to live in a permanently bugged state. :) Andrew |
From: Tom R. <ro...@gm...> - 2011-09-22 03:40:52
|
I have no opinion on this. So if someone wants to change the default, I have no qualms about it. On Wed, Sep 21, 2011 at 8:32 PM, Maks Verver <mak...@ge...> wrote: > On 09/22/2011 02:00 AM, Jason Summers wrote: >> Thanks, but I'd prefer not to have any say in the design of Golly's user >> interface. > > Since that makes three people in favour and two without a specific > preference, I think I should go ahead and implement this. > > Note that for existing users this doesn't change anything, because the > old default will already be saved in the settings, but I do think this > will be more intuitive to new users. > >> (1.) The de facto standard is that rolling the wheel *toward* you pushes >> the objects *away* from you (makes them smaller). It wasn't the standard >> at the time this was originally implemented. And it will never feel >> right to me, but that battle is over, and my side lost. > > I think the question is whether you view the mouse wheel as manipulating > the object on screen or you (the viewer). When you are viewing a page > (for example, in a word processor or a browser) then scrolling down > ("backward") actually moves the page UP, or the viewer DOWN. > > If you agree that this is right, then scrolling backward should also > move the viewer away from the grid (i.e. zooming out), wouldn't you agree? > > - Maks. > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Golly-test mailing list > Gol...@li... > https://lists.sourceforge.net/lists/listinfo/golly-test > -- -- http://cube20.org/ -- http://golly.sf.net/ -- |
From: Maks V. <mak...@ge...> - 2011-09-22 03:32:33
|
On 09/22/2011 02:00 AM, Jason Summers wrote: > Thanks, but I'd prefer not to have any say in the design of Golly's user > interface. Since that makes three people in favour and two without a specific preference, I think I should go ahead and implement this. Note that for existing users this doesn't change anything, because the old default will already be saved in the settings, but I do think this will be more intuitive to new users. > (1.) The de facto standard is that rolling the wheel *toward* you pushes > the objects *away* from you (makes them smaller). It wasn't the standard > at the time this was originally implemented. And it will never feel > right to me, but that battle is over, and my side lost. I think the question is whether you view the mouse wheel as manipulating the object on screen or you (the viewer). When you are viewing a page (for example, in a word processor or a browser) then scrolling down ("backward") actually moves the page UP, or the viewer DOWN. If you agree that this is right, then scrolling backward should also move the viewer away from the grid (i.e. zooming out), wouldn't you agree? - Maks. |
From: Maks V. <mak...@ge...> - 2011-09-22 03:17:34
|
On 09/22/2011 04:30 AM, Andrew Trevorrow wrote: > Does it make any difference if you add a ";" at the end? > Or if you prepend a line with "use strict;"? Neither seems to make a difference. > Note that g_fatal is only called if $@ is non-empty, so for some > reason 5.14 has changed behavior and is setting $@. > > So it looks to me like the Perl_croak(aTHX_ NULL) call in pl_exit *is* > setting $@ to a non-empty string, which doesn't make a lot of sense given > that we're passing NULL for the msg! Yes, croak sets the string to "$message at $file line $lineno", so even if the message is empty, you get a string of the form " at $file line $lineno". Maybe old versions handled an argument of NULL differently, except that if I test this on a system with Perl 5.10: perl -e 'use Carp; eval "croak"; print "($@)\n"' It prints the " at $file.." stuff too. So I don't see how it was ever possible to terminate a script with croak() and not have it result in g_fatal() being called with some non-empty error message. > Note that the pl_exit code calls AbortPerlScript() just before > calling Perl_croak. That sets scripterr to abortmsg, which is checked > in CheckScriptError to prevent any message box appearing. True, but the problem is that scripterr is overwritten in pl_fatal(). > Anyway, I think the safest fix is to add a check in pl_fatal: > [..] > Let me know if that works. That works fine. Should I commit this? (It still bugs me on principle that I don't understand why this change is necessary!) - Maks. |
From: Andrew T. <an...@tr...> - 2011-09-22 02:30:34
|
Maks: > I've noticed that with my latest version of Golly and Perl (5.14.1) Perl > scripts cause an error message to be popped up when the script calls > g_exit. (There is no such error if the script exits normally, and > Python scripts do not have this problem.) > > So for example, I have a script that just contains: > > g_exit("Bye!") Does it make any difference if you add a ";" at the end? Or if you prepend a line with "use strict;"? Maybe Perl 5.14 is less forgiving about such things. > When run, the status bar shows "Bye!" (which is right) and then a > message box pops up with title "Perl error" and contents " at > /path/to/script.pl line 1." (which should not happen). > > I tried to investigate what causes this, but I'm confused. From what > I've gathered, wxperl.cpp doesn't run a script directly, but executes a > snippet of code like this: > > do "script.pl"; g_fatal($@) if $@; > > This causes errors during compilation/execution of the script to be > stored in $@ to be passed to g_fatal(), which saves them in scripterr > which is in turn used in wxscript.cpp to pop up the error message. Note that g_fatal is only called if $@ is non-empty, so for some reason 5.14 has changed behavior and is setting $@. > All of this makes sense because g_exit() calls croak() to abort the > script and that causes the error message to be stored in $@, which > causes the call to g_fatal(). Note that the pl_exit code calls AbortPerlScript() just before calling Perl_croak. That sets scripterr to abortmsg, which is checked in CheckScriptError to prevent any message box appearing. So it looks to me like the Perl_croak(aTHX_ NULL) call in pl_exit *is* setting $@ to a non-empty string, which doesn't make a lot of sense given that we're passing NULL for the msg! Anyway, I think the safest fix is to add a check in pl_fatal: if (scripterr == wxString(abortmsg,wxConvLocal)) { // this can happen in Perl 5.14 so don't change scripterr // otherwise a message box will appear } else { // store message in global string (shown after script finishes) scripterr = wxString(err, wxConvLocal); } Let me know if that works. Andrew |
From: Jason S. <ja...@po...> - 2011-09-22 01:58:04
|
Andrew Trevorrow wrote: > Maks: > >> 1. Switch the default mousewheelmode from 1 to 2. >> 2. Make mouse wheel zooming relative to the cursor's position. > > I've no objections to these changes, but that's only because I don't > use my mouse's wheel mode all that often. Jason Summers wrote that > code so it would probably be a good idea to double-check with him > before making any changes. Thanks, but I'd prefer not to have any say in the design of Golly's user interface. (1.) The de facto standard is that rolling the wheel *toward* you pushes the objects *away* from you (makes them smaller). It wasn't the standard at the time this was originally implemented. And it will never feel right to me, but that battle is over, and my side lost. -- Jason Summers |
From: Maks V. <mak...@ge...> - 2011-09-22 00:58:46
|
Hi everyone, I've noticed that with my latest version of Golly and Perl (5.14.1) Perl scripts cause an error message to be popped up when the script calls g_exit. (There is no such error if the script exits normally, and Python scripts do not have this problem.) So for example, I have a script that just contains: g_exit("Bye!") When run, the status bar shows "Bye!" (which is right) and then a message box pops up with title "Perl error" and contents " at /path/to/script.pl line 1." (which should not happen). I tried to investigate what causes this, but I'm confused. From what I've gathered, wxperl.cpp doesn't run a script directly, but executes a snippet of code like this: do "script.pl"; g_fatal($@) if $@; This causes errors during compilation/execution of the script to be stored in $@ to be passed to g_fatal(), which saves them in scripterr which is in turn used in wxscript.cpp to pop up the error message. All of this makes sense because g_exit() calls croak() to abort the script and that causes the error message to be stored in $@, which causes the call to g_fatal(). So by this logic I understand why calling g_exit() leads to the message box being displayed. What I don't understand is how this code was intended to work, or why this wouldn't happen before. Does anyone know what to make of this? Did something change in Perl 5.14 that caused this? If so, how did this work correctly before? - Maks. |
From: Tim H. <tim...@gm...> - 2011-09-21 11:24:51
|
On 19 September 2011 17:34, Tim Hutton <tim...@gm...> wrote: > Is there a way to avoid the problem, do you know? I'm guessing doubles > have the same problem, so we've only reduced the problem, not avoided > it completely. > > Should I manually round small values to zero? I tried this and it worked. From 100 fps to 1000fps. https://code.google.com/p/reaction-diffusion/source/browse/trunk/SpeedComparisons/GrayScott/gray_scott.cpp#187 > > On 19 September 2011 16:28, Tom Rokicki <ro...@gm...> wrote: >> I was going to suggest the issue is denormals. I think it's pretty probable. >> >> On Mon, Sep 19, 2011 at 7:37 AM, Tim Hutton <tim...@gm...> wrote: >>> On 18 September 2011 00:38, Tim Hutton <tim...@gm...> wrote: >>>> On 17 September 2011 23:24, Andrew Trevorrow <an...@tr...> wrote: >>>>>>> Single core: 1,600 fps >>>>>>> OpenCL on AMD Radeon HD 6970M: 140,000 fps >>>>>> >>>>>> Wow that's a *lot* faster in the single-core case than on my machine. >>>>>> How interesting. >>>>> >>>>> The figure of 1,600 fps is after the spots have filled the window. >>>>> When the single core version first starts up I see fps rates of about 450. >>>>> The rate then gradually increases as the spots fill the window. >>>>> Is that what you see? >>>> >>>> Yes, I do. My reported figure was from near the beginning, so I will >>>> have another look at that. >>> >>> Just left it running for a bit and I get 1100 fps on a single core. >>> But it's under 100 fps at the slowest point. >>> (This is on an Intel i7-2600, rated at 3.4 GHz.) >>> >>> The speed seems to be related to the content of the scene. Change the >>> if-statement in init() to read: >>> >>> if(hypot(i%50-25,j%50-25)<=frand(2,5)) >>> >>> (this puts lots of dots into the starting frame instead of one) >>> >>> On my machine this immediately hits 1100 fps and stays there. With >>> OpenMP I now get 5000 fps. Still only 14000 fps in OpenCL though. >>> >>> Maybe we are hitting the problem of 'denormals' - floating point underflow. >>> http://www.cygnus-software.com/papers/x86andinfinity.html >>> But Visual Studio 9 Pro is saying "D9002 : ignoring unknown option >>> '/Qftz'" so maybe there's a reason it's not supported. >>> Someone here suggested using doubles instead of floats: >>> http://forums.nvidia.com/index.php?s=115f66f09e4dcc7e6771d8a06ff57246&showtopic=188850&view=findpost&p=1173133 >>> I don't understand why that causes a value-dependent slowdown but >>> changing to doubles throughout does greatly improve things in my >>> build. And they suggest that using SSE avoids this problem too. I'm >>> now going to try to understand what SSE is. >>> >>> >>> -- >>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >>> >>> ------------------------------------------------------------------------------ >>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>> Learn about the latest advances in developing for the >>> BlackBerry® mobile platform with sessions, labs & more. >>> See new tools and technologies. Register for BlackBerry® DevCon today! >>> http://p.sf.net/sfu/rim-devcon-copy1 >>> _______________________________________________ >>> Golly-test mailing list >>> Gol...@li... >>> https://lists.sourceforge.net/lists/listinfo/golly-test >>> >> >> >> >> -- >> -- http://cube20.org/ -- http://golly.sf.net/ -- >> >> ------------------------------------------------------------------------------ >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >> Learn about the latest advances in developing for the >> BlackBerry® mobile platform with sessions, labs & more. >> See new tools and technologies. Register for BlackBerry® DevCon today! >> http://p.sf.net/sfu/rim-devcon-copy1 >> _______________________________________________ >> Golly-test mailing list >> Gol...@li... >> https://lists.sourceforge.net/lists/listinfo/golly-test >> > > > > -- > Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ > -- Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ |
From: Mark J. <mar...@gm...> - 2011-09-21 11:03:49
|
I'm also in favor of these changes. Incidentall, I recently implemented similar scroll behavior in one of my own applications, so if Golly uses a compatible internal data structure, then I could give you the ready-working algorithms (I worked on it for several days; the algorithms aren't easy to prefect) How mine works: - I have two AffineTransform instances that work in tandem; one is the forward transform and one is the backward one (because an affine transform is not easy to invert, if possible at all; this is a tradeoff). It could probably be done with a single instance of a homohectic transform (which stores just zoom and offset values) which is easy to invert, but I didn't feel like writing a new class while I could use existing ones. - The transform translates viewport values to world values. The viewport values are pixel values relative to the upper-left corner of the window. Mark On Wed, Sep 21, 2011 at 10:36 AM, Dave Greene <dav...@gm...>wrote: > >> 1. Switch the default mousewheelmode from 1 to 2. > >> 2. Make mouse wheel zooming relative to the cursor's position. > > > > I've no objections to these changes, but that's only because I don't > > use my mouse's wheel mode all that often. > > I'm strongly in favor of both of those changes, especially the second > one. I suspect that somebody somewhere will have a "standard" > application where scrolling up zooms out, and will consider up=in to > be counter-intuitive. So #1 could certainly start a war of some > kind... > > On the other hand, I've been keeping a "corrected" GollyPrefs (with > the up=in setting) handy for each new build of Golly, for years now, > because I find the up=out default to be downright infuriating. I > think I can dig up at least a dozen applications I've used over the > years where scrolling up zooms in, and if there are any exceptions > I've conveniently suppressed them. > > Keep the cheer, > > > Dave > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Golly-test mailing list > Gol...@li... > https://lists.sourceforge.net/lists/listinfo/golly-test > |
From: Dave G. <dav...@gm...> - 2011-09-21 08:37:02
|
>> 1. Switch the default mousewheelmode from 1 to 2. >> 2. Make mouse wheel zooming relative to the cursor's position. > > I've no objections to these changes, but that's only because I don't > use my mouse's wheel mode all that often. I'm strongly in favor of both of those changes, especially the second one. I suspect that somebody somewhere will have a "standard" application where scrolling up zooms out, and will consider up=in to be counter-intuitive. So #1 could certainly start a war of some kind... On the other hand, I've been keeping a "corrected" GollyPrefs (with the up=in setting) handy for each new build of Golly, for years now, because I find the up=out default to be downright infuriating. I think I can dig up at least a dozen applications I've used over the years where scrolling up zooms in, and if there are any exceptions I've conveniently suppressed them. Keep the cheer, Dave |
From: Andrew T. <an...@tr...> - 2011-09-21 05:32:56
|
Maks: > 1. Switch the default mousewheelmode from 1 to 2. > 2. Make mouse wheel zooming relative to the cursor's position. I've no objections to these changes, but that's only because I don't use my mouse's wheel mode all that often. Jason Summers wrote that code so it would probably be a good idea to double-check with him before making any changes. Not sure if Jason still reads this list, so his email is ja...@po.... Andrew |
From: Maks V. <mak...@ge...> - 2011-09-21 03:46:05
|
Hi everyone, I've noticed two oddities related to zooming using the mouse wheel in Golly: 1. By default, scrolling up zooms out, and scrolling down zooms in, while in all other applications I use (like browsers, photo editors, et cetera) scrolling up (usually combined with control) zooms in. There is already a preference setting to change the behaviour in Golly, but it defaults to scroll-up-zooms-out. 2. Zooming with the scroll wheel zooms in/out relative to the center of the view (like the zoom shortcut keys do) instead of relative to the mouse cursor location (like the zoom cursors do). I think both of these are counter-intuitive, so I would like to change them in the following way: 1. Switch the default mousewheelmode from 1 to 2. 2. Make mouse wheel zooming relative to the cursor's position. Since this would change existing behaviour instead of just adding a new feature, I wanted to ask if anyone opposes these changes? (There may be good reasons for the current behaviour that I wasn't aware of!) - Maks. |
From: Tom R. <ro...@gm...> - 2011-09-19 22:42:26
|
One more thing. Focus on your existing CPU and exactly how that works. With SSE, you sacrifice portability, so generally you have multiple code paths for different architectures. (The SSE registers keep getting wider, and that will probably continue.) We'll figure out what code paths we want to support (what SSE level) as well as of course the straightforward ALU path. (Even the ALU path should be avoiding denormals, of course; in any practical sense, once you get small enough to be denormal in a "simulation", you are at zero, since MIN_FLOAT is incredibly small.) On Mon, Sep 19, 2011 at 3:29 PM, Tim Hutton <tim...@gm...> wrote: > None of this is in Golly yet. I'm using my reaction-diffusion project > as a testbed. And to avoid complicating that project too much I've now > moved the OpenCL stuff and the other comparisons into a separate > SpeedComparisons folder. > > I'm having a look at some SSE tutorials. It's good for me to learn > this stuff anyway of course. Some of my current questions: Is SSE > Intel-specific? Is SSE2 anything to worry about? How does SSE relate > to all the other on-chip stuff like MMX? How well supported in SSE on > different operating systems? > > On 19 September 2011 17:41, Tom Rokicki <ro...@gm...> wrote: >> Well, frankly, we should be using SSE anyway. I'm sorry I didn't get time >> to spend on it this weekend, but in general using SSE is not bad at all. >> You set up your data correctly, and normally the compiler will swoop in >> and vectorize. If that doesn't work there are intrinsics. >> >> Even with SSE you need to deal with denormals. >> >> There's a control bit you can set to tell it to flush denormals to zero. >> >> I might get some time for this soon; I'm not sure. Frankly, if I were to >> experiment with this, I'd pull it out of Golly and put it in a tiny >> standalone program to experiment with . . . >> >> On Mon, Sep 19, 2011 at 9:34 AM, Tim Hutton <tim...@gm...> wrote: >>> Is there a way to avoid the problem, do you know? I'm guessing doubles >>> have the same problem, so we've only reduced the problem, not avoided >>> it completely. >>> >>> Should I manually round small values to zero? >>> >>> On 19 September 2011 16:28, Tom Rokicki <ro...@gm...> wrote: >>>> I was going to suggest the issue is denormals. I think it's pretty probable. >>>> >>>> On Mon, Sep 19, 2011 at 7:37 AM, Tim Hutton <tim...@gm...> wrote: >>>>> On 18 September 2011 00:38, Tim Hutton <tim...@gm...> wrote: >>>>>> On 17 September 2011 23:24, Andrew Trevorrow <an...@tr...> wrote: >>>>>>>>> Single core: 1,600 fps >>>>>>>>> OpenCL on AMD Radeon HD 6970M: 140,000 fps >>>>>>>> >>>>>>>> Wow that's a *lot* faster in the single-core case than on my machine. >>>>>>>> How interesting. >>>>>>> >>>>>>> The figure of 1,600 fps is after the spots have filled the window. >>>>>>> When the single core version first starts up I see fps rates of about 450. >>>>>>> The rate then gradually increases as the spots fill the window. >>>>>>> Is that what you see? >>>>>> >>>>>> Yes, I do. My reported figure was from near the beginning, so I will >>>>>> have another look at that. >>>>> >>>>> Just left it running for a bit and I get 1100 fps on a single core. >>>>> But it's under 100 fps at the slowest point. >>>>> (This is on an Intel i7-2600, rated at 3.4 GHz.) >>>>> >>>>> The speed seems to be related to the content of the scene. Change the >>>>> if-statement in init() to read: >>>>> >>>>> if(hypot(i%50-25,j%50-25)<=frand(2,5)) >>>>> >>>>> (this puts lots of dots into the starting frame instead of one) >>>>> >>>>> On my machine this immediately hits 1100 fps and stays there. With >>>>> OpenMP I now get 5000 fps. Still only 14000 fps in OpenCL though. >>>>> >>>>> Maybe we are hitting the problem of 'denormals' - floating point underflow. >>>>> http://www.cygnus-software.com/papers/x86andinfinity.html >>>>> But Visual Studio 9 Pro is saying "D9002 : ignoring unknown option >>>>> '/Qftz'" so maybe there's a reason it's not supported. >>>>> Someone here suggested using doubles instead of floats: >>>>> http://forums.nvidia.com/index.php?s=115f66f09e4dcc7e6771d8a06ff57246&showtopic=188850&view=findpost&p=1173133 >>>>> I don't understand why that causes a value-dependent slowdown but >>>>> changing to doubles throughout does greatly improve things in my >>>>> build. And they suggest that using SSE avoids this problem too. I'm >>>>> now going to try to understand what SSE is. >>>>> >>>>> >>>>> -- >>>>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>>>> Learn about the latest advances in developing for the >>>>> BlackBerry® mobile platform with sessions, labs & more. >>>>> See new tools and technologies. Register for BlackBerry® DevCon today! >>>>> http://p.sf.net/sfu/rim-devcon-copy1 >>>>> _______________________________________________ >>>>> Golly-test mailing list >>>>> Gol...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/golly-test >>>>> >>>> >>>> >>>> >>>> -- >>>> -- http://cube20.org/ -- http://golly.sf.net/ -- >>>> >>>> ------------------------------------------------------------------------------ >>>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>>> Learn about the latest advances in developing for the >>>> BlackBerry® mobile platform with sessions, labs & more. >>>> See new tools and technologies. Register for BlackBerry® DevCon today! >>>> http://p.sf.net/sfu/rim-devcon-copy1 >>>> _______________________________________________ >>>> Golly-test mailing list >>>> Gol...@li... >>>> https://lists.sourceforge.net/lists/listinfo/golly-test >>>> >>> >>> >>> >>> -- >>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >>> >>> ------------------------------------------------------------------------------ >>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>> Learn about the latest advances in developing for the >>> BlackBerry® mobile platform with sessions, labs & more. >>> See new tools and technologies. Register for BlackBerry® DevCon today! >>> http://p.sf.net/sfu/rim-devcon-copy1 >>> _______________________________________________ >>> Golly-test mailing list >>> Gol...@li... >>> https://lists.sourceforge.net/lists/listinfo/golly-test >>> >> >> >> >> -- >> -- http://cube20.org/ -- http://golly.sf.net/ -- >> >> ------------------------------------------------------------------------------ >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >> Learn about the latest advances in developing for the >> BlackBerry® mobile platform with sessions, labs & more. >> See new tools and technologies. Register for BlackBerry® DevCon today! >> http://p.sf.net/sfu/rim-devcon-copy1 >> _______________________________________________ >> Golly-test mailing list >> Gol...@li... >> https://lists.sourceforge.net/lists/listinfo/golly-test >> > > > > -- > Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Golly-test mailing list > Gol...@li... > https://lists.sourceforge.net/lists/listinfo/golly-test > -- -- http://cube20.org/ -- http://golly.sf.net/ -- |
From: Tom R. <ro...@gm...> - 2011-09-19 22:39:36
|
> I'm having a look at some SSE tutorials. It's good for me to learn > this stuff anyway of course. Some of my current questions: Is SSE > Intel-specific? It's x86/x64 specific. In other words, AMD+Intel, pretty much. > Is SSE2 anything to worry about? How does SSE relate > to all the other on-chip stuff like MMX? How well supported in SSE on > different operating systems? SSE is just the enhanced MMX. SSE is fully supported on all modern operating systems, including the free ones. Some compilers are set up to automatically use SSE (sometimes you need to give them a command line switch to do so); if they can reasonably easily vectorize they will. Now, I think you are doing a checkerboard update (not sure if this is true or not). If indeed you are doing a checkerboard update, in order to support SSE, you may need to separate your odd and even frames or some such. SSE wants to do the same operation on contiguous memory/operands. Alternatively, you can pack and unpack to rearrange things, but this slows things down. You *probably* want to use floats instead of doubles. I don't know much about the numerical stability of reaction-diffusion, but floats gets you 2X the operations per cycle. SSE math can be faster than ALU math, over and above the 4- or 8- ops per instruction; there are a lot more registers and this lets the main ALU be used for addressing and the like. For your problem, I'd focus first on single-core SSE; you should be able to get it to the point where you know where every cycle is going, and your speed is "theoretical" maximum. Then, to multicore, we need to stripe the array and figure out how we want to arrange the coherence events for minimal impact. -tom |
From: Tim H. <tim...@gm...> - 2011-09-19 22:30:11
|
None of this is in Golly yet. I'm using my reaction-diffusion project as a testbed. And to avoid complicating that project too much I've now moved the OpenCL stuff and the other comparisons into a separate SpeedComparisons folder. I'm having a look at some SSE tutorials. It's good for me to learn this stuff anyway of course. Some of my current questions: Is SSE Intel-specific? Is SSE2 anything to worry about? How does SSE relate to all the other on-chip stuff like MMX? How well supported in SSE on different operating systems? On 19 September 2011 17:41, Tom Rokicki <ro...@gm...> wrote: > Well, frankly, we should be using SSE anyway. I'm sorry I didn't get time > to spend on it this weekend, but in general using SSE is not bad at all. > You set up your data correctly, and normally the compiler will swoop in > and vectorize. If that doesn't work there are intrinsics. > > Even with SSE you need to deal with denormals. > > There's a control bit you can set to tell it to flush denormals to zero. > > I might get some time for this soon; I'm not sure. Frankly, if I were to > experiment with this, I'd pull it out of Golly and put it in a tiny > standalone program to experiment with . . . > > On Mon, Sep 19, 2011 at 9:34 AM, Tim Hutton <tim...@gm...> wrote: >> Is there a way to avoid the problem, do you know? I'm guessing doubles >> have the same problem, so we've only reduced the problem, not avoided >> it completely. >> >> Should I manually round small values to zero? >> >> On 19 September 2011 16:28, Tom Rokicki <ro...@gm...> wrote: >>> I was going to suggest the issue is denormals. I think it's pretty probable. >>> >>> On Mon, Sep 19, 2011 at 7:37 AM, Tim Hutton <tim...@gm...> wrote: >>>> On 18 September 2011 00:38, Tim Hutton <tim...@gm...> wrote: >>>>> On 17 September 2011 23:24, Andrew Trevorrow <an...@tr...> wrote: >>>>>>>> Single core: 1,600 fps >>>>>>>> OpenCL on AMD Radeon HD 6970M: 140,000 fps >>>>>>> >>>>>>> Wow that's a *lot* faster in the single-core case than on my machine. >>>>>>> How interesting. >>>>>> >>>>>> The figure of 1,600 fps is after the spots have filled the window. >>>>>> When the single core version first starts up I see fps rates of about 450. >>>>>> The rate then gradually increases as the spots fill the window. >>>>>> Is that what you see? >>>>> >>>>> Yes, I do. My reported figure was from near the beginning, so I will >>>>> have another look at that. >>>> >>>> Just left it running for a bit and I get 1100 fps on a single core. >>>> But it's under 100 fps at the slowest point. >>>> (This is on an Intel i7-2600, rated at 3.4 GHz.) >>>> >>>> The speed seems to be related to the content of the scene. Change the >>>> if-statement in init() to read: >>>> >>>> if(hypot(i%50-25,j%50-25)<=frand(2,5)) >>>> >>>> (this puts lots of dots into the starting frame instead of one) >>>> >>>> On my machine this immediately hits 1100 fps and stays there. With >>>> OpenMP I now get 5000 fps. Still only 14000 fps in OpenCL though. >>>> >>>> Maybe we are hitting the problem of 'denormals' - floating point underflow. >>>> http://www.cygnus-software.com/papers/x86andinfinity.html >>>> But Visual Studio 9 Pro is saying "D9002 : ignoring unknown option >>>> '/Qftz'" so maybe there's a reason it's not supported. >>>> Someone here suggested using doubles instead of floats: >>>> http://forums.nvidia.com/index.php?s=115f66f09e4dcc7e6771d8a06ff57246&showtopic=188850&view=findpost&p=1173133 >>>> I don't understand why that causes a value-dependent slowdown but >>>> changing to doubles throughout does greatly improve things in my >>>> build. And they suggest that using SSE avoids this problem too. I'm >>>> now going to try to understand what SSE is. >>>> >>>> >>>> -- >>>> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >>>> >>>> ------------------------------------------------------------------------------ >>>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>>> Learn about the latest advances in developing for the >>>> BlackBerry® mobile platform with sessions, labs & more. >>>> See new tools and technologies. Register for BlackBerry® DevCon today! >>>> http://p.sf.net/sfu/rim-devcon-copy1 >>>> _______________________________________________ >>>> Golly-test mailing list >>>> Gol...@li... >>>> https://lists.sourceforge.net/lists/listinfo/golly-test >>>> >>> >>> >>> >>> -- >>> -- http://cube20.org/ -- http://golly.sf.net/ -- >>> >>> ------------------------------------------------------------------------------ >>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >>> Learn about the latest advances in developing for the >>> BlackBerry® mobile platform with sessions, labs & more. >>> See new tools and technologies. Register for BlackBerry® DevCon today! >>> http://p.sf.net/sfu/rim-devcon-copy1 >>> _______________________________________________ >>> Golly-test mailing list >>> Gol...@li... >>> https://lists.sourceforge.net/lists/listinfo/golly-test >>> >> >> >> >> -- >> Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ >> >> ------------------------------------------------------------------------------ >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >> Learn about the latest advances in developing for the >> BlackBerry® mobile platform with sessions, labs & more. >> See new tools and technologies. Register for BlackBerry® DevCon today! >> http://p.sf.net/sfu/rim-devcon-copy1 >> _______________________________________________ >> Golly-test mailing list >> Gol...@li... >> https://lists.sourceforge.net/lists/listinfo/golly-test >> > > > > -- > -- http://cube20.org/ -- http://golly.sf.net/ -- > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > Learn about the latest advances in developing for the > BlackBerry® mobile platform with sessions, labs & more. > See new tools and technologies. Register for BlackBerry® DevCon today! > http://p.sf.net/sfu/rim-devcon-copy1 > _______________________________________________ > Golly-test mailing list > Gol...@li... > https://lists.sourceforge.net/lists/listinfo/golly-test > -- Tim Hutton - http://www.sq3.org.uk - http://profiles.google.com/tim.hutton/ |