scheduling latency tests / high performance low latency audio by Benno Senoner


The main reason I created this page and the benchmark was to analyze how well Linux handles
 scheduling latencies of programs which must do things in realtime, under high system load.
low latencies are required in the multimedia field, because desktop users expect smooth aduio/video performance on their powerful boxes.
most of the problems (jerky video/choppy audio) are software related and mainly caused by the OS scheduler.
With the low latency patch, Linux can compete or beat most desktop OSes in this field allowing latencies as low as 2.1msec which is especially useful in the realtime audio field.
BTW: this page is VERY chaotic (because of the lack of time to make a decent one), I plan a complete redesign and moving it on www.linuxdj.com within the next months, please be patient.

If you are searching for the linux audio home, please visit the Linux Audio Developer site:

www.linuxdj.com/audio/lad

here you will find what's happening on the cutting edge audio scene and will find links to all related discussions software, software authors, related sites, archives, docs etc.

Vuoi passare da windows a Linux e cerchi informazioni su come usare Linux sul tuo desktop ? Allora visita www.LinuxDesktop.it

News:

07/25/00:

  low latency 2.2.16 kernel RPMs for Redhat 6.x released ! easy to install.

  latencytest-0.42-png with PNG support is now available for those who have problems with the GIF version of the gd lib. Thanks to Chris Baugher (baugher@enteract.com) for the changes.

06/24/00:

  latencies of kernel 2.4.0-test2 benchmarked
summary: 2.4.0-test2 is not significantly better than 2.2.x. (worst case latencies still around 100ms)
Why ? Because almost no stuff from Ingo's lowlatency work has made its way into the mainstream kernel.

06/09/00:

  don't forget to visit my www.linuxdj.com site
It hosts the Audio Quality HOWTO, the linux-audio-dev homepage,
(which talks about the hot latency topic too)
my hdrbench harddisk recording benchmark and other useful ressources.
The goal is to provide all the ressources needed to linux developers and users, so that they can get out the maximum of the OS.

05/12/00:

  a low-latency patch for 2.2.15 is available at Ingo's site: get it here

Notice: this patch has been tested by me on my Celeron box, and it performs ok on a single processor system. (with a little tweaking you can apply it to 2.2.16 kernel too, see the news section)
on SMP boxes the performance is still BAD (70ms latencies), but that will be fixed in 2.3.x , 2.4.x kernels

Unfortunately 2.2.15 (plain) seems not to deliver the best disk I/O throughtput and neither the latest 2.3.99-pre6-7 does. Bad news for us harddisk recording folks. :-( Anyway these bugs will surely fixed very soon.

Montavista is now too working in the low-latency field on Linux because they use it for embedded stuff

They are currently improving the scheduler in order to perform nicely even when there are hundreds of processes active

Plus a couple of weeks ago I mailed with an SGI engineer and he told me that SGI is interested in realtime 60-120 Hz flicker-free video (will the next Reality Engine run on Linux ?) Stay tuned !

I have been told that Steinberg, the maker of Cubase VST and Nuendo is very interested in Linux, because it's performance and growing demand

If we are lucky, then within a year or less, we will be able to run VST plugins with latencies close to their hardware counterparts, with all the flexibility and performance that Linux brings ! (Hopefully Emagic will port their cool software like Logic to Linux as well)

02/01/00:

 released latency-graph API 0.1 which allows to generate diagrams similar to those produced by latencytest
from your application with little efforts

09/16/99:

 performed tests on an old Pentium, again AMAZING results:
Hardware: P133, Mainboard Tyan Tomcat HX , RAM 64 MB disk IBM 6GB EIDE, Soundcard TB Tropez plus, video ATI MACH64
Software: RH6.0 + KDE
I was able to get 2.1ms latency , but lowered the CPU load to 60%
because CPU load=80% made the box unusable
wold be the same as to work on a Pentium with 20-25Mhz :-)
look at the RESULTS (3x128 audio buffer)

09/11/99:

 latencytest 0.42 released: get it here: latencytest-0.42.tar.gz
fro the PNG version see the News section
new features:

- added benchmarking of the Realtime Clock (RTC device), using async notification via SIGIO handler (you must apply a small patch to the kernel to enable this)
Thanks to Paul Barton Davis for this.
A typical use for the RTC device could be a high precision MIDI sequencer.

the max jitter is around 300-500ms under high load, really impressive !
look that the excellent results:

RTC 2048HZ (0.5ms period) CPU load=80%

RTC 2048HZ (0.5ms period) CPU load=10%

more really good news: I discovered that can reduce the audio buffersize
from 4.3ms to 2.1ms without losing reliability: that means audio latency down now to ONLY 2.1ms !
look at the diagrams, the max jitter was only 700usecs (very sporadic) !

audio buffer 2.1ms (3x128) CPU load=80%

audio buffer 2.1ms (3x128) CPU load=10%

08/28/99:

 *EXCITING* NEWS:
things getting almost perfect !
Ingo's lowlatency-2.2.10-N6 patch with the shm.c part backed out and a modification of filemap.c (thanks to Roger Larsson)
performs _REALLY_ well, using my usual latencytest parameters (4.3ms buffer),
I got NO DROP-OUTS anymore, with sporadic maximum peaks of ONLY 2.9ms
This is really exciting because it opens the doors to a whole new class of Realtime applications for Linux, simply using userspace processes scheduled SCHED_FIFO.
I heard of comparable low-latencies only from BEOS,
Windows can't simply guarantee these kind of latencies, not even using DirectX.
Using a soft-synth on Win98 on my BOX I must use 15-20ms audio buffers to get _SOMEWHAT_ reliable audio.
This is actually about more than 3-4times the buffer I used for testing under Linux ( 4.35ms).

See the testresults here
You can download the lowlatency-2.2.10-N6B patch here (obsolete and unstable, use the 2.2.15 ones or the kernel RPM)
for a patch for 2.2.13 see below in the Download section

As usual to get these good results don't forget to tune your EIDE disks with hdparm -d 1 -m 8 -u 1 -c 1

The patch is rock solid EIDE and SCSI disk access works well, but unfortunately some problems still remain:

- my ISDN Hisax driver (Fritz Card classic) crashes when I when load the module
it does not happen on a standard 2.2.10 kernel, it would be nice if the ISDN people could figure out if this is due to some soft of race.
I had not found any instability except this ISDN problem.

- Disk performance decreases by about 10-25% on high CPU loads, maybe this is caused by the higher scheduling overhead when you run lowlatency apps like my latencytest.



07/31/99:

 new testresults released using the mingo-lowlatency-2.2.10-N2 patch
get the patch here
The patch is still not perfect but helps very much, the /proc stress is perfect.
On the disk I/O tests there are still some sporadic problems.
With mem=256m I get up to 28ms latency on the disk write test
performing the same test with mem=64m the latency goes down to about 11ms
It seems that the problem is still correlated to the RAM size, (Does anyone have an idea which nasty kernel routine could be the cause of this behavior ?)
Otherwise Mingo's patch produces really nice diagrams, the latency jitter is much smaller than without the patch , just do your comparision with a non patched kernel. :-) Here the results:

mem=256m
mem=128m
mem=64m

07/31/99:

 new testresults released (using a mixed patch from Andrea+Mingo)
 put online a latency-profiling patch from Roger Larsson, which allows you to see which part of the kernel has long execution paths (not tested yet)
 get the patch for 2.2.10 here
note that you must use ksyms -a to transform adresses in function names

07/06/99:

 latencytest-0.41  released:
 added USE_GENERIC_TIMER option to allow compilation on non-Intel architectures ( uses gettimeofday() instead of RDTSC)

07/03/99:

 latencytest-0.4  released:
 produces nice scheduling latency analysis charts

Download:

get the benchmark here: latencytest-0.42.tar.gz
 
for the PNG version see the news section
lowlatency-2.2.10-N6B.patch (this patch allows <3ms scheduling latencies on PII boxes, even under while high Disk I/O)
The patch is outdated and unstable, use newer 2.2.15/16 versions, see above.

lowlatency-2.2.13-A1.patch (note that some people reported that the patch doesn't work as well as the 2.2.10 version
due to some remaining bugs/problems, but Ingo is about to release a definitive patch (which fixes the problems) for
2.2.13 and for 2.3.x ( yes, it will get into linux 2.4 !),
I will post the patches on my site as soon they will be available)

 latency-graph API 0.1
which let's you generate latencytest-like diagrams from your application.

Introduction

The main reason of the program is to measure scheduling latencies under high system load, of programs which must do things in realtime.

Actually there are 5 operating system stressing classes:

- heavy graphics output , using x11perf to simulate large
  BitBlts

- heavy access to the /proc filesystem using "top" with an update frequency of 0.01 sec

- disk write stress ( write a large file to disk)
- disk copy  stress (copy a large file to an other)
- disk read  stress ( read a large file from disk)
 

I wrote the benchmark to test the realtime audio (PCM I/O) capatibilities of Linux.
In future I will extend the program to test other subsystems, like MIDI I/O , serial I/O and using usleep()s

The playing is done strictly from RAM.
the player thread gets RT priorty through sched_setscheduler() is is scheduled with FIFO policy at maximum priority

the player sits in a loop which does basically the following

while(1)
{
  time1=my_gettime();
  waste 80% of the CPU of the duration of one audio fragment
  time2=my_gettime();
  write(audio_fd,playbuffer,fragmentsize);
  time3=my_gettime();

}

time3-time1 = duration of one loop ( CPU wasting + audio output)

If this time gets bigger then the audio buffer ( n fragments) then you will hear an audio dropout.

time2-time1 = duration of the CPU wasting loop should be constant at 80% of the fragment timelen ,
but can vary if if there is some device on the bus (DMA/PCI contention ?)  or kernel I/O routine which steals cycles to the CPU.

On some graphics cards, heavy graphic output, blocks the bus for too much time, and therefore the process gets blocked too long, and
the deadline (in this case the audio buffer timelen) will be missed.
 
 

My Hardware:

CPU: PII 400
Mainboard: ASUS P2B BX chipset
Harddisk: IBM Deskstar 16 GB EIDE (UDMA)
Graphics card: Matrox G100 4MB AGP

Note that my harddisk and gfx card don't suffer of the bus blocking problems
(I tested this under Win98 using 20ms audio buffers during high disk I/O, and there were no dropouts)
 
 
 

Tests with lowlatency-2.2.10-N6 patch : VERY GOOD RESULTS

As said above Linux becomes now usable for realtime apps which need response times in the 1-3ms range, by simply using an userspace process with realtime scheduling (SCHED_FIFO). Application examples in the audio field are:

- MIDI sequencers with high precision timing.
- Harddisk recorders/software synths / realtime FX processors
Linux now provides the necessary capabilities to keep up with the performance of pure hardware based solutions.
For example, it's now possible to use your Linux box as an
Effect processor with only 5ms latency, with rock solid performance ( = no sound dropouts)
even when the disk is doing heavy I/O.
Linux would now even allow to run an entire Cubase VST-like application in a 5ms-latency enviroment.
This would mean that you could use your VST plugins to process the audio-input of the soundcard in realtime (with 5ms latency),
while playing back your 50 audio tracks from the disk, all without glitches.
Of course you would hear all parameter changes on EQs/Filters/Plugins/Volume etc. in realtime, like on hardware based solutions.

Look at the excellent results

Now I have only 2 more hopes:

- patches get into the mainstream kernel
( seems that Linus doesn't like Mingo's "re-scheduling hookups" very much, he want some cleaner solution)

- audio vendors port their software to Linux ( Cubase VST / Emagic Logic / N.I Reaktor etc.)

there are no excuses anymore, (at least from a technical point of view) not to port audio software to Linux.

- Does Windows98 guarantee rock solid <5ms latency ? NO

- Does Windows98 support SMP ? NO


Attention: the stuff below is outdated and provided only for completeness
The tests below are "non-cutting-edge" , and show the performance of standard Linux kernels, and earlier patches

Tests with lowlatency-2.2.10-N2 patch

here the results on Linux 2.2.9 + 1000HZ patch

I used the 1000HZ patch because on a standard 2.2.9 sometime I get up to 7.5ms total latency
during the X11 stress test.
With the 1000 HZ patch , the latency  never went above 2.5 ms during this test.
Unfortunately the disk stress tests don't gave better results with HZ=1000, because there is some locking involved.
 

Tests with lowlatency-2.2.10-N2 patch

still problems but things are looking better , especially with little RAM.
The problem seems that some kernel routines take longer to execute when you have installed more RAM:
I made 3 tests, booting with mem=64m , mem=128m and mem=256m the audio buffer was 4.35ms ( 3x256 bytes)
the results are very interesting: (look at the diagrams !)

mem=64m  :

 /proc stress  3.5ms latency  0 overruns  (this is VERY good !)
 disk write:  12.5ms latency 18 overruns
 disk copy:   11.1ms latency 14 overruns
 disk read:    6.3ms latency 16 overruns  (quite nice :-) )

diagrams
mem=128m  :
 /proc stress  5.4ms latency  102 overruns
 disk write:  14.5ms latency   20 overruns
 disk copy:   11.4ms latency   43 overruns
 disk read:   10.5ms latency    7 overruns
diagrams

mem=256m  :
 /proc stress  9.2ms latency  101 overruns
 disk write:  48.7ms latency   13 overruns
 disk copy:   31.6ms latency   24 overruns
 disk read:   11.9ms latency    4 overruns
diagrams

As we see more RAM leads to bigger latencies, maybe due to the fact that travesing a large list of pages or buffer chains takes up too long time.
Roger Larrson measured up to 80ms latency in d_lookup on his Pentium with 512 MB RAM (relatively slow machine with a big amount of RAM)
When Ingo Molnar will come up with his new patch (hopefully with shorter execution paths), then I will post new figures.

Test1 (disk highly tuned , async mode)

highly tuned harddisk with DMA , 32bit transfer  , multicount , unmask-irq activated
( hdparm -m 8 -d 1 -u 1 -c 1 /dev/your-ide-hd )
 

Test results
 

Notes:

The graphics stress performed well, as you can see using a 4.38ms audio buffer ( 3 fragments of 256 byte)
the max total latency of the main loop ( CPU loop+audio output) is about 2.5 ms which is 1ms above the 1.45 fragment latency.
The pure CPU loop latency differs from the nominal value ( 1.16ms)  maximal 0.5ms, and stays in the +/-0.1 range 99.99% of time
 

/proc file system stress:

You can see that heavy access to /proc causes total scheduling latencies of about 10ms ,
but the CPU lopp latency differs maximal 0.5ms , therefore I suspect these 10ms delays are due to the kernel locking.

disk write test:

You see a maximum latency of 46ms , but I measured up to 130ms in some tests,
these long scheduling delays are quite sporadic (10 deadline misses = buffer overruns) compared to the measured interval ( about 50sec), and therefore I think it's a curable problem.
Consider the fact that the total latency stays 99.86% of time in the +/-1ms range, which is quite good.
Again the CPU loop latency differs maximal 0.5ms , and therefore I conclude that these long scheduling delays are a spinlocking related problem.

disk copy test:

Similar results to the write tests, a bit more dropouts (but consider the amount of moved data is about twice as in the write stress)
the results are similar , max scheduling latency about 36ms , but I measured up to 100-130ms in some tests.

disk read test:

Disk read seems to stress the system less than disk write, even if there are fewer dropouts, they are in the same range,
the CPU loop latency max variation ( 0.5ms) is similar than in the write/copy disk stress test.
 

Test 2 ( disk highly tuned in sync mode)

Disk tuning parameters identical to the first tests but during this test the disk ran in sync mode

(to mount the disk in sync mode, use   mount / -oremount,sync )

The disk performance is very bad in sync mode, therefore you should run the benchmark with half the filesize for the test file, if you don't want to wait forever :-)  )

Test results

Notes:

As you can see from the charts, the behaviour is much better ( I think the cause is that since the disk syncs data
more often , there are shorter interruptions , and therefore the scheduling latency of the benchmark is lower.

The disk I/O tests produce scheduling latencies of up 20-25ms , which is much lower than in async mode
(about  4-6 times better) , but the disk sync mode is not  suitable for general use, because the big disk
performance degratation.
 

Test 3 (disk with DMA=off, async mode)

In this test I turned off DMA of my IDE disk with the command
hdparm -d 0 /dev/your-ide-hd

The results are a *scheduling nightmare*

Read the notes below !

Test results

Notes:

Even the little disk activity to start the shellscript for the testing, causes a 10ms delay, which was not present in the DMA=on case.

You are unable to play continuous audio even using the full 64k DMA buffer of a soundcard, without interruptions,
during heavy disk I/O:

The disk write test caused a max scheduling latency of *4000ms* !! , VERY BAD
That is 4secs ! , your soundcard would need a 600KB DMA buffer to avoid drop-outs ,
and forget to think about low-latency audio without DMA IDE transfers on Linux.

The disk copy test performed a bit better ( 2000ms) , but still very badly
The disk read tests gave about 27ms scheduling latencies.

This time in the disk I/O tests, the CPU loop latency, is VERY HIGH, up to 300ms,
that means that even a RT-FIFO scheduled thread gets interrupted (for LONG time) by the kernel to perform disk I/O.

In this case the limiting factor doesn't seem the locking, but the kernel scheduling itself, which doesn't reschedule the
processes until the data is written to the disk. ( BUSY WAIT ?)

If someone is able to explain the cause of this behaviour, I will post it on this page.
 

How to optimize audio latency in your programs

- use SCSI disks if you can

- if you have EIDE disks tune all your disks as best as possible
    with the command: hdparm -m 8 -d 1 -u 1 -c 1 /dev/your-ide-hd
  this activates DMA IDE transfer , 32bit mode, multicount mode (multiple block transfer per interrupt) and IRQ unmasking

- use Realtime priority for your audio playing app:

for example you can use this routine to set RT FIFO priority  ( for additional info see man sched_setscheduler)

----
#include <sched.h>
int set_realtime_priority(void)
{
struct sched_param schp;
        /*
         * set the process to realtime privs
         */
        memset(&schp, 0, sizeof(schp));
        schp.sched_priority = sched_get_priority_max(SCHED_FIFO);

        if (sched_setscheduler(0, SCHED_FIFO, &schp) != 0) {
                perror("sched_setscheduler");
                return -1;
        }
}
----
 

- avoid disk I/O as much as possible, if you must perform disk I/O in you audio app, perform it in a separate
thread, which runs not at realtime priority , or at lower realtime priority than your audio playing thread.
You can comunicate between the disk I/O thread and the audio playing/recording thread, using
pthreads which share the memory between the threads, or if you create regular processes with fork(),
you can use shared memory segments to do intercommunication.

- use a fragment number greater than 2 , because you get better buffer utilization.

- don't  usleep() in your audio playing code , because usleep() is not much accurate, and when DISK I/O occurs
I measured scheduling latencies of up to 150ms !

- don't access to the /proc filesystem too much, because it causes about 10ms latency on my PII400,
on a Pentium class machine the latencies could be much higher
 

send me comments , suggestions and questions
regards,
Benno.

sbenno@gardena.net.
 

last update: 07/03/99