The main reason I created this page and the benchmark was to analyze
how well Linux handles
scheduling latencies of programs which must do things in realtime,
under high system load.
low latencies are required in the multimedia field, because desktop
users expect smooth aduio/video performance on their powerful boxes.
most of the problems (jerky video/choppy audio) are software related
and mainly caused by the OS scheduler.
With the low latency patch, Linux can compete or beat most desktop OSes in
this field allowing latencies as low as 2.1msec which is especially useful
in the realtime audio field.
BTW: this page is VERY chaotic (because of the lack of time to make a decent one), I plan a complete redesign and moving it on www.linuxdj.com within the next months, please be patient.
If you are searching for the linux audio home, please visit the Linux Audio Developer site:
here you will find what's happening on the cutting edge audio scene and will find links to all related discussions software, software authors, related sites, archives, docs etc.
Vuoi passare da windows a Linux e cerchi informazioni su come usare Linux sul tuo desktop ? Allora visita www.LinuxDesktop.it
News:
07/25/00:
low latency 2.2.16 kernel RPMs for Redhat 6.x released ! easy to install.
latencytest-0.42-png with PNG support is now available for those who have problems with the GIF version of the gd lib. Thanks to Chris Baugher (baugher@enteract.com) for the changes.
06/24/00:
latencies of kernel 2.4.0-test2 benchmarked
summary: 2.4.0-test2 is not significantly better than 2.2.x. (worst case latencies still around 100ms)
Why ? Because almost no stuff from Ingo's lowlatency work has made its way into the mainstream kernel.
06/09/00:
don't forget to visit my www.linuxdj.com site
It hosts the Audio Quality HOWTO, the linux-audio-dev homepage,
(which talks about the hot latency topic too)
my hdrbench harddisk recording benchmark and other useful ressources.
The goal is to provide all the ressources needed to linux developers
and users, so that they can get out the maximum of the OS.
05/12/00:
a low-latency patch for 2.2.15 is available at Ingo's site:
get it here
Notice: this patch has been tested by me on my Celeron box, and it performs
ok on a single processor system. (with a little tweaking you can apply it
to 2.2.16 kernel too, see the news section)
on SMP boxes the performance is still BAD (70ms latencies), but
that will be fixed in 2.3.x , 2.4.x kernels
Unfortunately 2.2.15 (plain) seems not to deliver the best disk I/O
throughtput and neither the latest 2.3.99-pre6-7 does.
Bad news for us harddisk recording folks. :-(
Anyway these bugs will surely fixed very soon.
Montavista is now too working
in the low-latency field on Linux because they use it for embedded stuff
They are currently improving the scheduler in order to perform nicely
even when there are hundreds of processes active
Plus a couple of weeks ago I mailed with an SGI engineer and he told
me that SGI is interested in realtime 60-120 Hz flicker-free video
(will the next Reality Engine run on Linux ?)
Stay tuned !
I have been told that Steinberg, the maker of Cubase VST and Nuendo is
very interested in Linux, because it's performance and growing demand
If we are lucky, then within a year or less, we will be able to
run VST plugins with latencies close to their hardware counterparts,
with all the flexibility and performance that Linux brings !
(Hopefully Emagic will port their cool software like Logic to Linux as well)
02/01/00:
released latency-graph API 0.1
which allows to generate diagrams similar to those produced by latencytest
from your application with little efforts
09/16/99:
performed tests on an old Pentium, again AMAZING results:
Hardware: P133, Mainboard Tyan Tomcat HX , RAM 64 MB
disk IBM 6GB EIDE, Soundcard TB Tropez plus, video ATI MACH64
Software: RH6.0 + KDE
I was able to get 2.1ms latency , but lowered the CPU load to 60%
because CPU load=80% made the box unusable
wold be the same as to work on a Pentium with 20-25Mhz :-)
look at the RESULTS (3x128 audio buffer)
09/11/99:
latencytest 0.42 released: get it here:
latencytest-0.42.tar.gz
fro the PNG version see the News section
new features:
- added benchmarking of the Realtime Clock (RTC device),
using async notification via SIGIO handler (you must apply a small patch to the
kernel to enable this)
Thanks to Paul Barton Davis for this.
A typical use for the RTC device could be a high precision MIDI sequencer.
the max jitter is around 300-500ms under high load,
really impressive !
look that the excellent results:
RTC 2048HZ (0.5ms period) CPU load=80%
RTC 2048HZ (0.5ms period) CPU load=10%
more really good news: I discovered that can reduce the audio buffersize
from 4.3ms to 2.1ms without losing reliability:
that means audio latency down now to ONLY 2.1ms !
look at the diagrams, the max jitter was only 700usecs (very sporadic) !
audio buffer 2.1ms (3x128) CPU load=80%
audio buffer 2.1ms (3x128) CPU load=10%
08/28/99:
*EXCITING* NEWS:
things getting almost perfect !
Ingo's lowlatency-2.2.10-N6 patch with the shm.c part backed out
and a modification of filemap.c (thanks to Roger Larsson)
performs _REALLY_ well,
using my usual latencytest parameters (4.3ms buffer),
I got
NO DROP-OUTS anymore, with sporadic maximum peaks of ONLY 2.9ms
This is really exciting because it opens the doors to a whole new class
of Realtime applications for Linux, simply using userspace processes
scheduled SCHED_FIFO.
I heard of comparable low-latencies only from BEOS,
Windows can't simply guarantee these kind of latencies,
not even using DirectX.
Using a soft-synth on Win98 on my BOX I must use
15-20ms audio buffers to get _SOMEWHAT_ reliable audio.
This is actually about more than 3-4times the buffer I used for testing
under Linux ( 4.35ms).
See the testresults here
You can download the lowlatency-2.2.10-N6B patch
here (obsolete and unstable, use the 2.2.15 ones or the kernel RPM)
for a patch for 2.2.13 see below in the Download section
As usual to get these good results don't forget to tune your EIDE disks
with hdparm -d 1 -m 8 -u 1 -c 1
The patch is rock solid EIDE and SCSI disk access works well,
but unfortunately some problems still remain:
- my ISDN Hisax driver (Fritz Card classic) crashes when I when load
the module
it does not happen on a standard 2.2.10 kernel, it would be nice
if the ISDN people could figure out if this is due to some soft of race.
I had not found any instability except this ISDN problem.
- Disk performance decreases by about 10-25% on high CPU loads,
maybe this is caused by the higher scheduling overhead when
you run lowlatency apps like my latencytest.
07/31/99:
new testresults released using the mingo-lowlatency-2.2.10-N2
patch
get the patch here
The patch is still not perfect but helps very much, the /proc stress is perfect.
On the disk I/O tests there are still some sporadic problems.
With mem=256m I get up to 28ms latency on the disk write test
performing the same test with mem=64m the latency goes down to about 11ms
It seems that the problem is still correlated to the RAM size,
(Does anyone have an idea which nasty kernel routine could be the cause of
this behavior ?)
Otherwise Mingo's patch produces really nice diagrams, the latency jitter
is much smaller than without the patch , just do your comparision with
a non patched kernel. :-)
Here the results:
mem=256m
mem=128m
mem=64m
07/31/99:
new testresults released (using a mixed patch from Andrea+Mingo)
put online a latency-profiling patch from Roger Larsson, which
allows you to see which part of the kernel has long execution paths (not tested yet)
get the patch for 2.2.10 here
note that you must use ksyms -a to transform adresses in function names
07/06/99:
latencytest-0.41 released:
added USE_GENERIC_TIMER option to allow compilation on non-Intel
architectures ( uses gettimeofday() instead of RDTSC)
07/03/99:
latencytest-0.4 released:
produces nice scheduling latency analysis charts
Download:
get the benchmark here: latencytest-0.42.tar.gz
for the PNG version see the news section
lowlatency-2.2.10-N6B.patch
(this patch allows <3ms scheduling latencies on PII boxes, even under while high Disk I/O)
The patch is outdated and unstable, use newer 2.2.15/16 versions, see above.
lowlatency-2.2.13-A1.patch
(note that some people reported that the patch doesn't work as well as the 2.2.10 version
due to some remaining bugs/problems, but Ingo is about to release a definitive patch (which fixes the problems) for
2.2.13 and for 2.3.x ( yes, it will get into linux 2.4 !),
I will post the patches on my site as soon they will be available)
latency-graph API 0.1
which let's you generate latencytest-like diagrams from your application.
Introduction
The main reason of the program is to measure scheduling latencies under high system load, of programs which must do things in realtime.
Actually there are 5 operating system stressing classes:
- heavy graphics output , using x11perf to simulate large
BitBlts
- heavy access to the /proc filesystem using "top" with an update frequency of 0.01 sec
- disk write stress ( write a large file to disk)
- disk copy stress (copy a large file to an other)
- disk read stress ( read a large file from disk)
I wrote the benchmark to test the realtime audio (PCM I/O) capatibilities
of Linux.
In future I will extend the program to test other subsystems, like
MIDI I/O , serial I/O and using usleep()s
The playing is done strictly from RAM.
the player thread gets RT priorty through sched_setscheduler() is is
scheduled with FIFO policy at maximum priority
the player sits in a loop which does basically the following
while(1)
{
time1=my_gettime();
waste 80% of the CPU of the duration of one audio fragment
time2=my_gettime();
write(audio_fd,playbuffer,fragmentsize);
time3=my_gettime();
}
time3-time1 = duration of one loop ( CPU wasting + audio output)
If this time gets bigger then the audio buffer ( n fragments) then you will hear an audio dropout.
time2-time1 = duration of the CPU wasting loop should be constant at
80% of the fragment timelen ,
but can vary if if there is some device on the bus (DMA/PCI contention
?) or kernel I/O routine which steals cycles to the CPU.
On some graphics cards, heavy graphic output, blocks the bus for too
much time, and therefore the process gets blocked too long, and
the deadline (in this case the audio buffer timelen) will be missed.
My Hardware:
CPU: PII 400
Mainboard: ASUS P2B BX chipset
Harddisk: IBM Deskstar 16 GB EIDE (UDMA)
Graphics card: Matrox G100 4MB AGP
Note that my harddisk and gfx card don't suffer of the bus blocking
problems
(I tested this under Win98 using 20ms audio buffers during high disk
I/O, and there were no dropouts)
Tests with lowlatency-2.2.10-N6 patch : VERY GOOD RESULTS
As said above Linux becomes now usable for realtime apps which
need response times in the 1-3ms range, by simply using
an userspace process with realtime scheduling (SCHED_FIFO).
Application examples in the audio field are:
- MIDI sequencers with high precision timing.
- Harddisk recorders/software synths / realtime FX processors
Linux now provides the necessary capabilities to keep up with
the performance of pure hardware based solutions.
For example, it's now possible to use your Linux box as an
Effect processor with only 5ms latency, with rock solid performance ( = no sound dropouts)
even when the disk is doing heavy I/O.
Linux would now even allow to run an entire Cubase VST-like application
in a 5ms-latency enviroment.
This would mean that you could use your VST plugins to process the
audio-input of the soundcard in realtime (with 5ms latency),
while playing back your 50 audio tracks from the disk,
all without glitches.
Of course you would hear all parameter changes on EQs/Filters/Plugins/Volume etc.
in realtime, like on hardware based solutions.
Look at the excellent
results
Now I have only 2 more hopes:
- patches get into the mainstream kernel
( seems that Linus doesn't like Mingo's "re-scheduling hookups" very much,
he want some cleaner solution)
- audio vendors port their software to Linux
( Cubase VST / Emagic Logic / N.I Reaktor etc.)
there are no excuses anymore, (at least from a technical point of view) not to port
audio software to Linux.
- Does Windows98 guarantee rock solid <5ms latency ? NO
- Does Windows98 support SMP ? NO
Attention: the stuff below is outdated and provided only
for completeness
The tests below are "non-cutting-edge" , and show the performance of standard
Linux kernels, and earlier patches
Tests with lowlatency-2.2.10-N2 patch
here the results on Linux 2.2.9 + 1000HZ patch
I used the 1000HZ patch because on a standard 2.2.9 sometime I get up
to 7.5ms total latency
during the X11 stress test.
With the 1000 HZ patch , the latency never went above 2.5 ms
during this test.
Unfortunately the disk stress tests don't gave better results with
HZ=1000, because there is some locking involved.
Tests with lowlatency-2.2.10-N2 patch
still problems but things are looking better , especially with
little RAM.
The problem seems that some kernel routines take longer to execute
when you have installed more RAM:
I made 3 tests, booting with mem=64m , mem=128m and mem=256m
the audio buffer was 4.35ms ( 3x256 bytes)
the results are very interesting: (look at the diagrams !)
mem=64m : /proc stress 3.5ms latency 0 overruns (this is VERY good !) disk write: 12.5ms latency 18 overruns disk copy: 11.1ms latency 14 overruns disk read: 6.3ms latency 16 overruns (quite nice :-) )
mem=128m : /proc stress 5.4ms latency 102 overruns disk write: 14.5ms latency 20 overruns disk copy: 11.4ms latency 43 overruns disk read: 10.5ms latency 7 overrunsdiagrams
mem=256m : /proc stress 9.2ms latency 101 overruns disk write: 48.7ms latency 13 overruns disk copy: 31.6ms latency 24 overruns disk read: 11.9ms latency 4 overrunsdiagrams
Test1 (disk highly tuned , async mode)
highly tuned harddisk with DMA , 32bit transfer , multicount ,
unmask-irq activated
( hdparm -m 8 -d 1 -u 1 -c 1 /dev/your-ide-hd )
Notes:
The graphics stress performed well, as you can see using a 4.38ms audio
buffer ( 3 fragments of 256 byte)
the max total latency of the main loop ( CPU loop+audio output) is
about 2.5 ms which is 1ms above the 1.45 fragment latency.
The pure CPU loop latency differs from the nominal value ( 1.16ms)
maximal 0.5ms, and stays in the +/-0.1 range 99.99% of time
/proc file system stress:
You can see that heavy access to /proc causes total scheduling latencies
of about 10ms ,
but the CPU lopp latency differs maximal 0.5ms , therefore I suspect
these 10ms delays are due to the kernel locking.
disk write test:
You see a maximum latency of 46ms , but I measured up to 130ms in some
tests,
these long scheduling delays are quite sporadic (10 deadline misses
= buffer overruns) compared to the measured interval ( about 50sec), and
therefore I think it's a curable problem.
Consider the fact that the total latency stays 99.86% of time in the
+/-1ms range, which is quite good.
Again the CPU loop latency differs maximal 0.5ms , and therefore I
conclude that these long scheduling delays are a spinlocking related problem.
disk copy test:
Similar results to the write tests, a bit more dropouts (but consider
the amount of moved data is about twice as in the write stress)
the results are similar , max scheduling latency about 36ms , but I
measured up to 100-130ms in some tests.
disk read test:
Disk read seems to stress the system less than disk write, even if there
are fewer dropouts, they are in the same range,
the CPU loop latency max variation ( 0.5ms) is similar than in the
write/copy disk stress test.
Test 2 ( disk highly tuned in sync mode)
Disk tuning parameters identical to the first tests but during this test the disk ran in sync mode
(to mount the disk in sync mode, use mount / -oremount,sync )
The disk performance is very bad in sync mode, therefore you should run the benchmark with half the filesize for the test file, if you don't want to wait forever :-) )
Notes:
As you can see from the charts, the behaviour is much better ( I think
the cause is that since the disk syncs data
more often , there are shorter interruptions , and therefore the scheduling
latency of the benchmark is lower.
The disk I/O tests produce scheduling latencies of up 20-25ms , which
is much lower than in async mode
(about 4-6 times better) , but the disk sync mode is not
suitable for general use, because the big disk
performance degratation.
Test 3 (disk with DMA=off, async mode)
In this test I turned off DMA of my IDE disk with the command
hdparm -d 0 /dev/your-ide-hd
The results are a *scheduling nightmare*
Read the notes below !
Notes:
Even the little disk activity to start the shellscript for the testing, causes a 10ms delay, which was not present in the DMA=on case.
You are unable to play continuous audio even using the full 64k DMA
buffer of a soundcard, without interruptions,
during heavy disk I/O:
The disk write test caused a max scheduling latency of *4000ms* !! ,
VERY BAD
That is 4secs ! , your soundcard would need a 600KB DMA buffer to avoid
drop-outs ,
and forget to think about low-latency audio without DMA IDE transfers
on Linux.
The disk copy test performed a bit better ( 2000ms) , but still very
badly
The disk read tests gave about 27ms scheduling latencies.
This time in the disk I/O tests, the CPU loop latency, is VERY HIGH,
up to 300ms,
that means that even a RT-FIFO scheduled thread gets interrupted (for
LONG time) by the kernel to perform disk I/O.
In this case the limiting factor doesn't seem the locking, but the kernel
scheduling itself, which doesn't reschedule the
processes until the data is written to the disk. ( BUSY WAIT ?)
If someone is able to explain the cause of this behaviour, I will post
it on this page.
How to optimize audio latency in your programs
- use SCSI disks if you can
- if you have EIDE disks tune all your disks as best as possible
with the command: hdparm -m 8 -d 1 -u 1 -c 1 /dev/your-ide-hd
this activates DMA IDE transfer , 32bit mode, multicount mode
(multiple block transfer per interrupt) and IRQ unmasking
- use Realtime priority for your audio playing app:
for example you can use this routine to set RT FIFO priority ( for additional info see man sched_setscheduler)
----
#include <sched.h>
int set_realtime_priority(void)
{
struct sched_param schp;
/*
* set the process
to realtime privs
*/
memset(&schp, 0,
sizeof(schp));
schp.sched_priority
= sched_get_priority_max(SCHED_FIFO);
if (sched_setscheduler(0,
SCHED_FIFO, &schp) != 0) {
perror("sched_setscheduler");
return -1;
}
}
----
- avoid disk I/O as much as possible, if you must perform disk I/O in
you audio app, perform it in a separate
thread, which runs not at realtime priority , or at lower realtime
priority than your audio playing thread.
You can comunicate between the disk I/O thread and the audio playing/recording
thread, using
pthreads which share the memory between the threads, or if you create
regular processes with fork(),
you can use shared memory segments to do intercommunication.
- use a fragment number greater than 2 , because you get better buffer utilization.
- don't usleep() in your audio playing code , because usleep()
is not much accurate, and when DISK I/O occurs
I measured scheduling latencies of up to 150ms !
- don't access to the /proc filesystem too much, because it causes about
10ms latency on my PII400,
on a Pentium class machine the latencies could be much higher
send me comments , suggestions and questions
regards,
Benno.
sbenno@gardena.net.
last update: 07/03/99