DTrace FAQ
From Genunix
This list contains many actual frequently asked questions for DTrace, and some extra questions of an informative nature. If you would like to provide feedback or suggestions, please post it to dtrace-discuss.
General
Is it "DTrace", "Dtrace" or "dTrace" or "DTRACE"? (And does it matter?)
It's "DTrace" -- regardless of what your word processor auto-correct, Sun's CEO or IBM tries to tell you, respectively. And yes, it matters. ;)
Will DTrace be released for Solaris 9?
No, the effort required is better spent enhancing Solaris further, rather than porting software to older versions. Some sites would love to DTrace their applications to solve long term problems, yet have a lengthy change procedure before Solaris 10 will be put into production. An option is to use Solaris 10 or OpenSolaris for testing and analysis; so that the problems can be found and then fixed on older Solaris.
What is DTrace?
A dynamic tracing facility that provides a comprehensive view of operating system and application behaviour. It has functionality similar to truss, apptrace, prex and mdb, bundled into a single scriptable tool that can examine both userland activity and the kernel. DTrace can be used on live production servers with often negligible impact on performance.
What is DTrace used for?
Performance analysis, observability, troubleshooting, debugging. Examples include watching disk I/O details live; and timing userland functions to determine hotspots.
Who will use DTrace?
Firstly, you need to be root or have one of the DTrace privileges to be able to invoke DTrace.
- Sysadmins can use DTrace to understand the behaviour of the operating system and applications.
- Application Programmers can use DTrace to fetch timing and argument details from the functions that they wrote, both in development and from live customer production environments.
- Kernel and Device Driver Engineers can use DTrace to debug a live running kernel and all its modules, without needing to run drivers in debug mode.
Do I need to know kernel internals to use DTrace?
No, although it can help. The following points should explain:
- People may get value from DTrace by using the pre-written and documented scripts from the DTraceToolkit, or from the scripts and one-liners documented in Solaris Peformance and Tools - and not need to write DTrace scripts themselves. However, people are encouraged to write their own custom scripts, which can prove more effective at solving their own site issues.
- There are many high level "providers" that are carefully designed to provide a succinct, stable and documented abstraction of the kernel (see the DTrace Guide, eg: proc, io, sched, sysinfo, vminfo), which make tracing the kernel much easier than it may sound.
- No kernel knowledge is required to study user-level application code only. Application developers can study the functions that they wrote, and that they are already familiar with.
- Understanding the Solaris kernel is necessary for writing advanced DTrace scripts for which there is currently no high level provider; for example, to examine TCP and IP activity in detail. The new book, Solaris Internals 2nd Edition is highly recommended.
Do I need privileges to use DTrace?
Yes, certain privilege groups allow DTrace to function. Generally, the dtrace_user privilege set allows access to the syscall and profile providers, and the dtrace_proc set will add access to the pid provider and any USDT-based providers. For example, without access to the dtrace_proc privilege set, you would lack the ability to monitor probes provided by DVM agents inside a running Java Virtual Machine, but could still see any syscall probes. If you're unsure of which privileges you currently have, running 'ppriv $$' at the command prompt will show what privileges your shell has been granted.
What are some DTrace success stories?
While DTrace has had countless wins, so far there are few that are documented online. (Those of us doing DTrace consulting are bound by NDAs, and can't broadcast customer site details to the world). For now, please see the following:
- Jarod Jenson, the world's most experienced DTrace consultant has been interviewed by SysAdmin Magazine and ACM. These interviews shed much light on the real-world success of DTrace.
- Brendan Gregg has documented some interesting examples of DTrace analysis.
Does DTrace exist for other Operating Systems?
No, not yet. DTrace was invented by the Sun engineers Bryan Cantrill, Mike Shapiro and Adam Leventhal for Solaris 10 and OpenSolaris. There is a project underway to port DTrace to FreeBSD, which already has some working functionality. (Many of us would like to see DTrace on other OSes, and encourage it!)
However, Apple has announced that DTrace will be "leveraged" in the developer tools for Mac OS X 10.5, aka "Leopard", due in Autumn 2007.
Wasn't this invented 20 years ago on mainframes?
No! DTrace can dynamically trace every function entry and return in the live kernel (around 36,000 probes); plus every function in user-level application code and libraries (for example, mozilla + libraries is over 100,000 probes); and user-level instructions (over 200,000 probes - just for the Bourne shell). Please read through the documentation on this DTrace site, and if you are still not convinced - email us on dtrace-discuss.
Will Sun ever release the source code?
Yes, Sun did in January 2005, and it was the first major component of the Solaris source that was released.
Are there training courses for DTrace?
- Sun Education offers a 3 day DTrace course worldwide, SA-327-S10.
- Context-Switch has offered 3 day DTrace workshops in the past, in the UK.
Are there books for DTrace?
Yes! There are two excellent books available,
- The DTrace Guide is a superb reference for DTrace which covers the language, providers, and is packed with examples. It was written by the DTrace engineers, and is the authorative reference. This entire book is available online in both HTML and PDF format, at no charge. A hardcopy is available to purchase from iUniverse.
- Solaris Performance and Tools demonstrates using DTrace in practical ways for performance observability and debugging. It was written by Richard McDougall and Jim Mauro (who also wrote Solaris Internals), and Brendan Gregg (DTraceToolkit).
D Language
What language is D most like?
The D programming language is based on C, and so any background in C programming will help. D is also arguably far easier than C, as only a small number of functions and variable types need to be learned to be able to write powerful scripts.
D programs are similar in form to awk programs, in that they are not a top-down program - rather that they are action based.
What are Probes and Providers?
A probe is an instrumentation point that can be traced by DTrace. For example, the probe "syscall::read:entry" is called when a read(2) syscall is called, and "syscall::read:return" is called when a read(2) syscall completes. There are four components to the probe name, provider:module:function:name. Provider is the most significant, the role of the other names are explained in the DTrace guide.
A provider is a collection of related probes, much like a library is a collection of functions. For example, the "syscall" provider provides probes for the entry and return for all system calls. The DTrace guide lists the providers as seperate chapters.
How do I DTrace ...?
Syscalls
System calls can be easily traced using the syscall provider, which provides a probe for both the entry and the return of the syscall, and variables for the entry arguments and the return code. As the midway point between user-land and the kernel, the syscall interface often reflects application behaviour well. Each syscall is also well documented in section 2 of the man pages. The following are some example DTrace oneliners.
Files opened by process name,
# dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'
dtrace: description 'syscall::open*:entry ' matched 2 probes
CPU ID FUNCTION:NAME
0 6329 open:entry df /var/ld/ld.config
0 6329 open:entry df /usr/lib/libcmd.so.1
0 6329 open:entry df /usr/lib/libc.so.1
0 6329 open:entry df /etc/mnttab
Syscall count by process name,
# dtrace -n 'syscall:::entry { @num[execname] = count(); }'
dtrace: description 'syscall:::entry ' matched 228 probes
^C
svc.startd 1
mozilla-bin 26
sshd 58
bash 88
dtrace 95
df 108
Syscall count by syscall,
# dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 228 probes
^C
lwp_self 1
<i>[...]</i>
write 33
sigaction 33
lwp_sigmask 53
ioctl 95
Of particular value may be to measure the elapsed time and on-CPU time of system calls, to both explain response time and CPU load. The procsystime tool from the DTraceToolkit does this using the -e and -o flags.
Disk I/O
Disk events can be traced using the io provider, which provides probes for the request and completion of both disk and client NFS I/O. Each probe provides extensive details of the I/O through the args[] array, as documented in the DTrace guide. The following lists the disk related probes,
# dtrace -ln 'io:genunix::' ID PROVIDER MODULE FUNCTION NAME 9571 io genunix biodone done 9572 io genunix biowait wait-done 9573 io genunix biowait wait-start 9582 io genunix default_physio start 9583 io genunix bdev_strategy start 9584 io genunix aphysio start
Points to bear in mind when using the io provider for tracing disk activity:
- This is actual disk I/O requests. Your application may be doing loads of I/O which is being absorbed by the file system cache.
- I/O completions (
io:::done) are asynchronous, sopidandexecnamewill not identify the responsible process. - Disk write requests (
io:::start) often occur asynchronously to the responsible process, as the file system has cached the write and is flushed to storage at a later time. -
ioevents don't necessarily mean that disk heads are moving somewhere - many disks have buffers to cache I/O activity, especially storage arrays.
The following are some example one-liners.
Disk size by process ID,
# dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->b_bcount); }'
dtrace: description 'io:::start ' matched 6 probes
CPU ID FUNCTION:NAME
0 9583 bdev_strategy:start 8238 tar 1024
0 9583 bdev_strategy:start 8238 tar 4096
0 9583 bdev_strategy:start 8238 tar 4096
0 9583 bdev_strategy:start 8238 tar 1024
0 9583 bdev_strategy:start 8238 tar 1024
0 9583 bdev_strategy:start 8238 tar 2048
Disk size aggregation,
# dtrace -n 'io:::start { @size[execname] = quantize(args[0]->b_bcount); }'
dtrace: description 'io:::start ' matched 6 probes
^C
tar
value ------------- Distribution ------------- count
512 | 0
1024 |@@ 37
2048 |@@@@@@@ 114
4096 |@@@@@@@ 116
8192 |@@@@@@@@@@@@@@@@@ 286
16384 |@@ 33
32768 |@@@@@ 87
65536 | 0
The DTraceToolkit contains many tools for analysing disk I/O, including:
- iosnoop - snoop I/O events as they occur
- iotop - display top disk I/O events by process
- bitesize.d - print disk event size report
- iofile.d - I/O wait time by filename and process
- iopattern - print disk I/O pattern
- seeksize.d - print disk seek size report
Error Messages
DTrace requires additional privileges
You must either be root or have additional privileges to be able to use DTrace. Those privileges are:
- dtrace_user - allows the use of profile, syscall and fasttrap providers, on processes that the user owns.
- dtrace_proc - allows the use of the pid provider on processes that the user owns.
- dtrace_kernel - allows most providers to probe everything, in read only mode.
Privileges can be added to a process (such as a user's shell) temporarily by using the ppriv(1) command. For example, to add dtrace_user to PID 1851,
ppriv -s A+dtrace_user 1851
usermod can be used to make this a permanent change to a user account. For example,
usermod -K defaultpriv=basic,dtrace_user brendan
drops on CPU #
dtrace: 864476 drops on CPU 0 dtrace: 2179050 drops on CPU 0 dtrace: 1343451 drops on CPU 0
The DTrace kernel buffer is overflowing due to output being generated too quickly for /usr/sbin/dtrace to read. This usually happens when your script would output hundreds of screens of text per second. Some remedies:
- Increase the switchrate of /usr/sbin/dtrace, so that rather than flushing the buffer at 1 Hertz (default), it is reading the buffer faster. At the command line this can be
-x switchrate=10hz. - Increase the size of the DTrace primary buffer. By default this is usually 4 Mbytes per CPU. At the command line it can be increased, eg
-b 8m. - Do you really want that much data to be output? Try to probe fewer events. Also, aggregations can be used so that DTrace can summarise the data and output the the final report, avoiding an output buffer overflow.
invalid address (0x...) in action
# dtrace -n 'syscall::open:entry { trace(stringof(arg0)); }'
dtrace: description 'syscall::open:entry ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 6329: syscall::open:entry):
invalid address (0xd27f7a24) in action #1
dtrace: error on enabled probe ID 1 (ID 6329: syscall::open:entry):
invalid address (0xd27fbf38) in action #1
This error is caused when DTrace attempts to dereference a memory address which isn't mapped. In the above example, the arg0 variable for the open(2) syscall refers to a user-land address, however DTrace executes in the kernel address space; this example can be fixed by changing stringof to copyinstr. Listing remedies:
- Use either
copyin()orcopyinstr()to copy the data from user-land into the kernel. - Attempt to dereference on the return of a function, not the entry. On the entry, an address may be valid but not faulted in.
failed to create probe ... Not enough space
DTrace ran out of RAM when trying to create probes. This can happen if you attempt to probe far too many events. For example, here we leave fields blank in our probe description (wildcards), and so our probe description will attempt to match every instruction from every function of mozilla (which would be millions of probes),
# dtrace -ln 'pid$target:::' -p `pgrep mozilla-bin` dtrace: invalid probe specifier pid$target:::: failed to create probe in process 7424: Not enough space #
In this case, perhaps we meant to probe just function entries - pid$target:::entry, or perhaps instructions from just one library - pid$target:libaio::.
