Toy Ray Tracer Mark2 - An Application for Retrocomputing Environments

Once you have got your emulated mainframe up and running with your favourite operating system, the next problem is: "what are you going to do with it?".

Well, mostly the idea is to see what computing was like on that machine - how the system works, what utilities are available, how you write programs for it, etc.

But occasionally, it would be nice to have something "fun" to run on it. Probably everyone has their own ideas for what would be "fun" in this context, and most people who are serious about playing with "old" computers (emulated or otherwise) will be quite happy to sit down with a case of beer and the Fortran compiler (or the Assembler) and rustle something up. Unlike the Linux and Windows NT worlds, there isn't likely to be a whole lot of "ready to build" software on the Internet for your chosen beast.

Being an inverterate computer graphics type, my immediate thought was: "ray tracing!". After all, it is (in principle) simple, and you can get a lot of prettiness in your pictures for not all that much effort (relatively speaking).

My target machine was a CDC Cyber 173 running on Tom Hunter's wonderful Desktop Cyber emulator. The operating system for that machine is NOS 1.4, first introduced in October 1979 (although the build I am using - PSR 552 - is from 1982).

NOS 1.4 is a very interesting operating system - suprisingly different from today's mainstream systems (Windows NT and Linux) - but still very capable in its own way. It supports many languages - but C is not one of them.

Apart from assembler, the "best" language to choose for writing an application on NOS is probably Fortran. Fortran was the language of choice for the "number crunching" programs that were the Cyber's raison d'etre, so the compiler is highly developed and generates good optimised code. (An interesting alternative might be SYMPL - CDC's high level system programming language. However, SYMPL is maybe not so appropriate for "number crunching" and there would be no chance of running a SYMPL program on anything other than a CDC machine).

TRT2 is therefore written in standard Fortran-77. It should be possible to port it back (in time) to Fortran-IV - the main problems would probably be the list-directed IO and the (very few) CHARACTER things. Apart from the tedium of converting block-structured IFs to the equivalent GOTOs, that is. I believe the FORTRAN-H compiler for the versions of MVS, VM/CMS, etc., available for use under the Hercules IBM mainframe emulator is Fortran-IV compatible, so it probably isn't possible to run TRT2 "as is" on those platforms, unfortunately.

TRT2 is (genuinely) free software - in every sense of the word. Should you be crazy enough to want to use it, feel free to do anything you like with it. Apart from complain about it to the author!

Please note that if you are looking for a good general purpose raytracer, this is not it! You would be much better off looking at POV-Ray.

Why is this TRT2? The original Toy Ray Tracer was a program I wrote back in 1984 to run on a VAX-11/780. It had far fewer features than TRT2 and I lost the source for it long ago. It did convince me that ray tracing was too slow to be practical back then (for computer animation production). A conclusion you might well agree with after
running TRT2 on your emulated machine!

Interestingly, it looks like withmodern hardware, better algorithms and great care in the implementation, ray tracing might turn out to be the method of choice for rendering in the near future - especially when dealing with huge scene databases. The work of the Computer Graphics Group at Saarland University is particularly impressive. (Try a Google search for "coherent ray tracing" - an 8 million polygon scene rendered by ray tracing at 1.6 frames/sec on a dual 800MHz x86 P-III is astonishing).


TRT2 Goals

TRT2 V0.1 Features

Please see the warning above! TRT2 is not a world beating general purpose ray tracer!

TRT2 Source Files

There are five source code files.

TRT2.TXT is a complete NOS batch job containing the source code in Modify format and a job that compiles (and effectively links) TRT2, saving the result as a "permanent file" called TRT2. After running this job, you can submit other jobs that run TRT2 with scene descriptions of your choice.

TRT2FT.TXT is a complete NOS batch job that runs Modify on the Modify format source code, and writes (almost) completely standard Fortran-77 to the (virtual) card punch.

trt2vms.for is the output of TRT2FT.TXT trivially modified so it will build on Alpha VMS systems. The changes were:

Use:

$ fort/opt/real_size=64 trt2vms
$ link trt2vms

to build it.

trt2linux.f is a modification of TRT2VMS.FOR so it will build on Linux using the GCC g77 compiler. In addition to using the Linux random number generator call, it was also necessary to change all REAL variables to DOUBLE PRECISION. There were two reasons for this:

Use:

$ g77 -O2 -o trt2linux -ffixed-line-length-none trt2linux.f

to compile and link it (the conversion of REAL to DOUBLE PRECISION was done crudely and made many lines "too long").

trt2vaxvms.for is a modification of trt2linux.f to
build on VAX VMS systems. This uses DOUBLE PRECISION everywhere, as per trt2linux.f, with the VMS random number function and calls to measure elapsed time (sadly not the same calls as are needed for Alpha VMS). Use:

$ fort/opt/extend trt2vaxvms
$ link trt2vaxvms

to compile and link it.


TRT2 Sample Job Files

These files are all for use on the Cyber under NOS. They are complete jobs that can be submitted via the (virtual) card reader, and write the output image as a text file to the (virtual) card punch. The actual input to TRT2 is the stuff between the ~ and the } (which Desktop Cyber uses for end-of-record and end-of-file cards). To use these scenes on VMS or Linux, just cut out those lines and put them in a file (and feed that in on Fortran LUN 5 - the output image will appear on Fortran LUN 7).

The images below were all rendered on the (emulated) Cyber 173 under NOS 1.4. The emulator version I use is a slightly hacked old V2.0 Beta 1. The hacks include back porting of the CP3446 card punch code from V2.1 and modifications to allow batch jobs to be submitted using a "watched directory". This lets me use a home-brew GUI interface to the Cyber where I can "drag-and-drop" a job card deck (file) into the (virtual) card reader directory. More on this elsewhere. The hardware is a now old and creaking SGI 1200 rack mount machine with dual 800MHz Intel P-III processors. Various bits such as the not-hot-plug disk frontplane and one of the SCSI controllers have died, but it continues to give sterling service as a virtual mainframe!

Here is a picture of the Cyber console while the machine is hard at work rendering an image:

If you would like to see the output from the job that compiles TRT2 please click here. Be warned that this is about 1MByte
of PDF (the listings from the CDC FTN5 compiler are quite thorough).

TRT2GO.TXT is a single sphere lit by three directional lights (red, green, blue)

The listing from the Cyber job that created this image is here. This job actually took about 1 hour 6 minutes to run (wall clock time). All these images were rendered at 512 x 512 with various supersampling options ... whatever was needed to get a good result. Please note that when busy, the Cyber's clock runs slow by about a factor of 1.8 - for these jobs, anyway. Later versions of the emulator track "real time" much better than this.

TRT2TS.TXT is four transparent balls, with shadows.

The listing from the Cyber job that created this image is here. This one is a "biggy" - it took 10 hours 45 minutes wall clock time.

TRT2T1.TXT is one ball and one triangle with a shadow being cast.

The listing from the Cyber job that created this image is here. This one took 1 hour 10 minutes.

TRT2TB.TXT is one triangle and three balls with soft shadows from a finite area light source.

The listing from the Cyber job that created this image is here. This one is a "another biggy" - it took 11 hours 53 minutes wall clock time.

TRT2DF.TXT is three balls at different distances from the camera, showing depth of field effects.

The listing from the Cyber job that created this image is here. This one took about 7 hours 34 minutes wall clock time.

TRT2HL.TXT is a helix of 200 transparent balls with shadows.

The listing from the Cyber job that created this image is here. This one took about 5 hours 20 mins. I thought it would be the longest running, but the acceleration scheme really helps here.

Thrilling stuff, eh?

All of these were planned out with the aid of graph paper, I'm ashamed to say! I have been thinking of using molecular data as a source of further models, but that may or may not happen.


Image Format Conversion Tools

Once you have an image in the form of a text file, you have to convert it to something more reasonable before you can view it. These two programs will convert the output from any of the Fortran versions of TRT2 to an uncompressed TIFF file:

cybimg.cpp can be built of either Windows NT or Linux. For Linux use:

$ g++ -o cybimg -DUNIX -DLINUX cybimg.cpp

cybimg.c can be built on Linux using:

$ gcc -o cybimg -DUNIX -DLINUX cybimg.c -lm

Either can be run using:

$ ./cybimg TextImageIn.txt TiffImageOut.tif

It is important to actually put the .tif on the output filename.


Implementations in Other Languages

Just for the sake of it (and because I must admit to writing a prototype in C++ in a style that could easily be converted to Fortran, so that I could debug the algorithms on a fast machine), there are two other source files that you might be interested in:

trt2.cpp is TRT2 in C++ (not making any use of C++'s good - or bad - features). It has been built on Windows NT (XP most recently) as well as Linux. To build it on Linux, use:

$ g++ -o c++trt2 -DUNIX -DLINUX trt2.cpp

c++trt2 is run like this:

c++trt2 SceneFile.in OutImage.tif

The output is a TIFF file rather than a text file! It is important to have the .tif on the end of the output file name you supply, by the way.

A Microsoft Visual C++ V6.0 project to build c++trt2 on NT is available here.

trt2.c is TRT2 in C, converted back from trt2.cpp. It hasn't been tested much, but it seems to be OK. To build it on Linux, use:

$ gcc -o ctrt2 -DUNIX -DLINUX trt2.c -lm

Usage is identical to c++trt2.


Scripts to Benchmark TRT2 on VMS, NT and Unix

There are a couple of scripts here that will build some suitable version of TRT2 and then run it on all the sample scenes, saving the elapsed times for each render.

runtrt.com is a VMS DCL script that can be
submitted as a batch job. It runs the Alpha VMS Fortran version of TRT2. Everything you need to know will get saved in runtrt.log, of course (since VMS has a proper batch environment).

runtrtvax.com is a VMS DCL script that can be
submitted as a batch job. It runs the VAX VMS Fortran version of TRT2. This is, sadly, slightly different from the Alpha version.

trt2_do_all.bat is an NT shell script
that can be run from a Command Prompt window. Output is logged to trt2-nt-cpp-log.txt

trt2_do_all.csh is a CSH/TCSH shell script
for Linux. It explicitly invokes the GNU version of time so elapsed times (etc.) get written to a log file - which is called: trt2-linux-f77-log.txt

trt2_do_all_irix.csh is a CSH/TCSH shell script
for SGI IRIX. A log file is written to: trt2-irix-f77-log.txt


Benchmark Results for some machines at HCCC

Here are some possibly interesting timings from running TRT2 on various machines. These are all in elapsed seconds on otherwise unused systems. Since we are running a mixture of Fortran, C, and C++ programs here, make of this what you will. Actually, I suspect the comparisons are valid - give or take a foot. But that is a view that would be difficult to defend if pressed.

Scene Ath64 P670 Boxx Ath P420 CHost Alpha Octane P1000 uVAX Cyber RCyber CE/CH
trt2go 4.20 5.44 5.80 7.18 10.29 12.78 37.41 50.08 322 360 3960 7920 307
trt2ts 36.18 38.52 62.08 60.80 93.44 115.61 281.83 419.46 2934 2939 38682 77364 335
trt2t1 4.20 5.55 8.51 7.14 10.45 12.97 34.36 55.38 399 408 4189 8378 323
trt2tb 35.25 40.60 62.42 64.49 78.59 96.63 241.43 401.59 3143 3019 42780 85560 442
trt2df 26.78 31.57 49.17 46.26 65.51 80.66 246.48 331.41 2304 2188 27240 54480 338
trt2hl 14.89 15.27 28.56 24.05 37.75 46.14 98.78 277.00 1204 1047 19200 38400 416

Ath64 is an AMD Athlon64 3000+ based machine (late 2004) running at 1.8GHz with 1GByte RAM running under Windows 2000 SP4. TRT2 (C++ version) was built using Microsoft Visual Studio 2003 with P4+ (/G7) optimizations selected. Thanks to Laurence Blunt for this result.

P670 is a Dell Precision 670 (late 2004) with 2 x 2.8GHz Xeon P4 EM64T processors and 2GBytes RAM (although TRT2 uses only about 1MByte). For this test, it was running Red Hat Enterprise Linux 3, AMD64 version (i.e. with x86_64 support). The compiler was GCC 3.2.3 (g77). TRT2 version was trt2linux.f. This machine clocks in with 5662.31 BogoMips.

Boxx is a Boxx "5D workstation" (a "special" configured for the now defunct 5D Solutions Ltd. - basically a Tyan Thunder i860 (S2603) motherboard). This hails from early-2002. This has 2 x 2.2GHz Xeon P4 processors and 1GByte RAM. It was running Red Hat Linux V8.0. The compiler was GCC 3.2 (g77) (trt2linux.f again). This machine clocks in with 4377.8 BogoMips.

Ath is an AMD Athlon based machine (2001) running at 1.2GHz with 512MBytes RAM running under Windows 2000 SP4. TRT2 (C++ version) was built using Microsoft Visual Studio 2003 with P4+ (/G7) optimizations selected. Thanks to Laurence Blunt for this result also.

P420 is a Dell Precision 420 (early 2001) with 2 x 1GHz Xeon P3 processors and 1 GByte RAM. It was running Windows XP Service Pack 1. The compiler was Microsoft Visual C++ V6.0 building from trt2.cpp.

CHost is the DtCyber host machine It is a Silicon Graphics 1200 server (from 2000) with 2 x 800MHz Xeon P3 processors and 1GByte RAM. It was running Windows 2000 and used the same trt2 binary as P420.

Alpha is a DEC AlphaServer 800 5/500 (from 1997) with 1 x 500MHz 21164 processor and 640MBytes RAM. It was running OpenVMS 7.3-1. The compiler was Compaq Fortran V7.5-1961, building from trt2vms.for.

Octane is a Silicon Graphics Octane (from 1997) with 2 x 195MHz R10000 processors and 512MBytes RAM. It was running SGI IRIX 6.5.21f. The compiler was GCC 3.0.4 building from trt2linux.f.

P1000 is a Tadpole P1000 laptop (from 1995) with 1 x 100MHz Intel Pentium (full fat variety) and 32MBytes RAM. It was running Linux 1.2.1 (Slackware distribution of 1995), with GCC 2.6.3. It clocks in at 40.18 BogoMips.

uVAX is a DEC MicroVAX 3100-95 (from 1993) with 1 x KA-51 NVAX processor and 128MBytes RAM. It is used primarily as the HCCC Web server. It runs OpenVMS VAX V7.3. This is possibly the second fastest uniprocessor VAX built, and is rated at 32VUPs. I.e. it is supposedly 32 times faster than an 11/780. The compiler was Compaq Fortran 77 V6.6-201 building from trt2vmsvax.for.

Cyber is (of course) an emulated CDC Cyber 173 running NOS 1.4, all under DtCyber V2.0 Beta 1 (with local hacks). The emulator is set so that 5 CP instructions are executed for each PP instruction which seems to give the optimal elapsed time performance (this idea is due to Gerard van der Grinten and Paul Koning).

CE/CH is an indication of how much faster code is run directly on the host than under the DtCyber emulator. The geometric mean of these ratios is about 357.

Here is the relative performance, showing how much slower than the fastest machine each system is:

Scene Ath64 P670 Boxx Ath P420 CHost Alpha Octane P1000 uVAX Cyber RCyber
trt2go 0.77 1.00 1.07 1.32 1.89 2.35 6.87 9.20 59.19 66.18 727.94 1455.88
trt2ts 0.94 1.00 1.61 1.57 2.42 3.00 7.31 10.88 76.16 76.29 1002.80 2008.41
trt2t1 0.76 1.00 1.53 1.29 1.88 2.34 6.19 9.97 71.89 73.51 754.77 1509.54
trt2tb 0.87 1.00 1.54 1.59 1.93 2.38 5.94 9.89 77.41 74.36 1053.69 2107.39
trt2df 0.85 1.00 1.56 1.47 2.07 2.56 7.81 10.49 72.98 69.31 862.84 1725.69
trt2hl 0.97 1.00 1.87 1.57 2.47 3.02 6.47 18.14 78.84 68.57 1257.36 2514.73
GeoMean 0.86 1.00 1.51 1.46 2.10 2.59 6.73 11.11 72.43 71.28 925.84 1851.89
ClkPeriod 1.56 1.00 1.27 2.33 2.80 3.50 5.60 14.36 28.00 --- --- ---
Work/Clk 1.81 1.00 0.84 1.59 1.33 1.35 0.83 1.29 0.38 --- --- ---
Date 2004 2004 2002 2001 2001 2000 1997 1997 1995 1993 --- 1973

GeoMean is the geometric mean of the ratios. ClkPeriod is the relative clock period (where known). Work/Clk is a relative measure of how much useful work was done per clock cycle compared to the P670 machine. The results are what you might expect. P3 Pentiums and Mips chips get more work done per MHz than P4 Pentiums and Alphas. The original Pentium (1995 vintage) does worst of all. Date is the approximate date of introduction.

Another interesting observation is how much more work is done per clock cycle by the AMD Athlon and (especially) the AMD Athlon64. These have a clear absolute performance advantage over the Intel processors of the same date too.


The Cyber 173 (real hardware!) dates from 1973. The version of NOS 1.4 the emulator runs is from around 1982. There is no sensible way to relate the performance of the emulated mainframe to the performance of the real hardware... however(!) ... Indulging in (more) rampant over-analysis ...

According to this rather interesting site:
http://homepage.virgin.net/roy.longbottom/whetstone.htm#anchorControlData the real Cyber 173 clocked in at 1.05MWIPS with Fortran OPT=2. For whatever it is worth, the DEC VAX-11/780 - once the canonical "1MIPS machine" - clocks in with 1.02MWIPS. So (tongue firmly in cheek) you might - incredibly roughly - think that the 1973 Cyber 173 had about the same performance as the 1978 VAX-11/780 with FPA. At this distance in time, "incredibly rough" is as good as it is likely to get!

Mark Riordan's "RANPI" benchmark (http://www.msu.edu/~mrr/mycomp/bench.htm) ran in 106.5 seconds on a MicroVAX-II. My DtCyber ran the same benchmark in 47 seconds. The MicroVAX-II was rated at 0.925MWIPS. So ... the emulator might be running:
(106.5 / 47) * (0.925 / 1.05) = 1.99 (or 2!)
times faster that the real Cyber 173 would have. And if you believe that, you will believe anything! RCyber gives the scaled timings should you choose to entertain such a ridiculous idea. A little more support is given by the geometric mean of the relative timings for the uVAX. This is supposed to be around 32 times faster than a VAX-11/780, and based on this, an 11/780 would be expected to have a relative performance of about 2280. Not too far from the 1852 for RCyber.


This graph shows the result of adding points for other systems (those I happen to have used in the distant past). These were arrived at by using the MWIPS figures from Roy Longbottom's data and using the ratio of the MWIPS for a given system to the MWIPS for the Cyber 173 to scale the estimated RCyber time. E.g., for the 6600:

Estimated run time = 1852 * 1.05(MWIPS C173) / 2.09(MWIPS 6600) = 930

Needless to say this is extremely speculative. Still quite interesting, though... You can interpret the picture in a few different ways. One possible conclusion would be that the "rule of thumb" version of Moore's Law - which has little do do with anything Moore actually said, probably - where you expect "performance" to double every 18 months and go up by an order of magnitude every 5 years is far too optimistic. A more reasonable estimate over the long run might be a factor of 12 every ten years. Even then, the CDC/Cray machines stand out as being much faster than one might expect for their time. This was noticed by others in these results too: http://www.jcmit.com/cpu-performance.htm


Emulator Performance

Tom Hunter has got "RANPI" to run in 10 seconds on an AMD Athlon 64 3000+ machine. So the emulator can run today at maybe 10 times the speed of the original hardware. Considering how dissimilar the emulated hardware is from the real hardware running it (60 bit CP words, 12 bit PP words, ones complement integers, entirely different floating point format, a system comprising 11 - or more - distinct processors which must appear to run simultaneously (sort of - this is complicated), etc.) ... this is very impressive.

It would be interesting to run this set of sample jobs on other emulators for other hardware (on the same real host system, of course). It would be easy to do for SimH running VMS on an emulated VAX. As noted above, it would be possible to get TRT2 running on MVS or, perhaps, VM/CMS using the Hercules emulator - but it would require a rather painful conversion to Fortran-IV, I think.

It would also be interesting to profile DtCyber when running this stuff to see where it is actually spending its time. Almost all the time is likely to be spent in the floating point emulation code for this application - but it would be interesting to know for sure.


Further Information

For more information on the (horrible) format of the scene database or any other aspect of TRT2, please see the source code. SUBROUTINE READDB starts with a long comment block describing the scene database pretty completely.

Needless to say, everything is in "camera space" with the camera looking down the positive Z axis - bigger Z means farther away.

You can download all this stuff as a ZIP file from here.

If you are as crazy as I am and actually try using TRT2, you can contact me using: glazzarda (at) acm.org
if you have any questions.

Last modified: 17th August 2005