Node:Slow-down, Previous:IO speed, Up:Performance

14.5 My ported program runs much slower!

Q: How come my program, which I ported from Borland/MS C and which doesn't use much I/O, still runs much slower under DJGPP?

A: Explore the following possible causes for this:

  1. You compiled the problem without optimizations. You should use at least -O2 to produce optimized code.

    If your program spends most of its time in a certain innermost loop, you should try enabling some of the optimization options which aren't enabled by -O2. Some of these are described in this FAQ, see speed-related optimization options.

  2. Your program extensively calls services other than I/O which require mode switch (like BIOS Int 10h, mouse services, etc.).

    You can tell how much your program switches to real mode by profiling your program. In the profile, look at the proportion of time your program spends in low-level library functions called __dpmi_int (which calls real-mode DOS/BIOS services) and __dj_movedata (which moves data between the transfer buffer and your program). If this proportion is large, try rewriting your program to minimize use of those functions which require a mode switch, even at a price of more computation (a mode switch usually eats up hundreds of CPU cycles).

  3. Your program might be running out of available physical memory and paging to disk. Watch the disk activity to find out whether this is the reason. If it is, you will have to configure your system differently (see system configuration), or change the way your program allocates memory.

    Sometimes, some device driver that uses extended memory takes up a significant portion of it, and leaves less for DJGPP programs, which then begin to page and slow down. For example, Novell Netware's VLM redirector and client software can use up to 0.5 MB of extended memory, even if you don't log into the network. A solution is not to load such resident software, or to buy more memory.

  4. Your program uses a lot of floating-point math, and you run it on a machine without an FPU. A tell-tale sign of this is that a function called __djgpp_exception_processor is high on the execution profile printed by Gprof. Due to the way FP emulation is implemented in DJGPP25, it might be significantly slower than the way real-mode DOS compilers handle it. The solution is either to rewrite your code so that it doesn't use floating-point code in its inner loops, or buy an FPU.
  5. Your program uses library functions/classes which are implemented less efficiently by DJGPP libc and the GNU C++ libraries. Or you might be a heavy user of functions which other compilers convert to inline code, while GCC doesn't inline most library functions. If this is the case, you will see those functions as "hot spots" on the program histogram produced by the Gprof profiler. If you find this to be the problem, write your own, optimized versions of those functions. It's best to write them as inline assembly functions, for maximum performance. If you find library functions which are inefficient, please inform the DJGPP news group by posting to the comp.os.msdos.djgpp news group, so this could be fixed by people who maintain the library.
  6. The size of the code/data in the innermost loop might be close to the size of the CPU cache (either L1 on-chip cache, or L2 cache on the motherboard). Compiling your program with a different compiler or a different combination of optimization options can cause the code to overflow the cache, which will dramatically affect the performance (usually, by a factor of 2). Running your program with the cache disabled will be instrumental to see whether this is your problem. If it is, try to rearrange your data/code, or use a different combination of optimization options.
  7. If the slow program was compiled for profiling (with the -pg switch), the slow-down might be due to a bug in the DJGPP library. See slow-down in profiled programs, for more about this.