InstrumentationDaily

Wednesday, June 21, 2017

More work on instrumentation, with the idea of moving more stuff to the recorder, so that it can be leveraged between projects.

Flight recorder

There was a bug that was introduced by this update to the ring code, and generated occasional data miscompare errors in the unit tests. There were three problems.

  • One was that moving from unsigned to uintptr_t means that integer promotion (automatically conversion of some expressions to int) may no longer happen in some expressions. Instead, we may get some other conversion to a larger type, including unsigned. The net result is that one expression that used to work OK with unsigned because it was implicitly comparing int values, i.e. signed comparison, turned into an unsigned comparison, which did not work. So I made the signed cast explicit. That only affected the test, though. Fixed in this commit.
  • Another one is that I botched the rewritten calls to the block-test functions, which could cause the writes to become blocking if there was no blocking function. This was fixed in this commit.
  • Another more serious issue was introduced when I rewrote the code to ensure that data read was contiguous for each call to ring_read. This lead to a premature udpate of the reader index, which allowed a writer to overwrite data we were reading. This was fixed in this commit.

Moral of the story: unit testing is good

Recorder designDaily

Tuesday, June 20, 2017

Blog update

Updated the blog article with recent updates, e.g. switching to C.

Recorder design

Spent some time writing up a recorder design write up, comparing it to the QEMU tracing.

My flight recorder is slightly different compared to what is called a “flight recorder” in the QEMU talk. The way I understand it, you build different QEMU variants with different instrumentation being generated, e.g. “simple” or “stderr” or “nop”. The “flight recorder” mode corresponds to a systemtap build activating selected probes during the init script.

By contrast, my intent is to provide instrumentation that is inexpensive enough to be always on, including in production systems. While the code can be disabled, the only use-case for disabling it should be for performance validation runs. In HPVM, where we used a circular-buffer flight-recorder, I measured overheads below 0.2% for workloads such as SDET or TPC, and that was with all fault handlers instrumented.

Why leave instrumentation in all the time? The reason is to provide meaningful data about the context in case of a fault or crash by “dumping” things that were recorded earlier. So when there is a SIGBUS or assert, you don’t get just a stack trace or a “CRITICAL” message, you get a detailed history of what led you there. Circular buffers being limited in size, you will only get the last N events of each type, but my implementation will sort them according to a shared global order, so for some infrequent, but important events (e.g. program options), that history can go hours or days back.

Another observation is that existing instrumentation often takes the form of printf-lookalikes. Something like FOO_DEBUG(“foo=%d ”, foo). Often, it is categorized, i.e. you may have FOO_DEBUG and BAR_DEBUG with different message prefixes, or (unfortunately more rarely) FOO_DEBUG(foo_category, “foo=%d”, foo). In order to be able to leverage this existing instrumentation in various projects, I designed my flight recorder to take this kind of printf-style input.

There are obviously caveats and limitations. You don’t want to format or allocate memory in the probe. So the printf-format will be rendered late if at all, meaning for example that you have to be careful about %s arguments, and you have a maximum number of arguments (4 in the current implementation).

Now, if we assume that, if only for the flight-recorder, the code contains instrumentation that looks like

	MY_DEBUG(category, “foo=%d bar=%zu”, foo, bar)
then you can do a number of things with this beyond the flight-recorder.

  • Obviously, nop is an option
  • Recording the message in the circular buffer for “category” should be the default operation. It enables crash dumps, but also recorder dumps while in a debugger, or interactively when something “funny” happens e.g. by sending a signal to the application (as simple as hitting Control-T on BSD/macOS)
  • You can test a per-category flag, and print messages of that category if the flag is set. This lets developers show relevant traces without having to filter out irrelevant ones (without being lost, they can still be shown with a recorder dump should they later prove useful)
  • The definition of the macro being in one place, you can #ifdef it to death. So there is no conceptual problem with having conditional code for <sys/sdt.h> style probes, etc. Now, each MY_DEBUG statement automatically becomes a systemtap probe probe, with no additional development cost.
  • Once data is in circular buffers, you can have a low-priority thread that copies the buffers to some file or pumps messages through some standard logging mechanism. Some filtering may happen at that stage (e.g. based on categories, or general concepts such as debug / warning / error. That way, logging is never a bottleneck for the main application threads.
  • A bit more difficult, I’m still working on this one… You can export selected “channels” of data from the ring buffers, to be consumed for example by external visualization tools. This is the place where I think that things like a common trace format could be leveraged, although this seems complex and it's not very clear that it's widely used.

The “printf-format” constraint makes this a bit more difficult. My current thinking is that you’d have a separate “data export” declaration for recorders that would decide what data to expose to the external tool, with information such as name and measurement unit.

Reaction to build

There was a story on Slashdot about 'build'. There were of course a few nasty comments. But there was also a nice one:

Wow!

I'm slashdot user #15,884 and my make-based autoconfiguration system is at least 3x as long. I could really learn from this guy.

macSpice committed

The macOS fixes for Spice-GTK are in. I still need a special environment, which looks like:

#!/bin/sh
export PATH=$PATH:/usr/local/Cellar/gettext/0.19.8.1/bin
export CFLAGS="-ObjC -g"
export CXXFLAGS=" -g"
export CPPFLAGS="-I/usr/local/opt/jpeg-turbo/include -I/usr/local/include -I/usr/local/Cellar/openssl/1.0.2j/include"
export LDFLAGS="-g -L/usr/local/opt/jpeg-turbo/lib -L/usr/local/lib -L/usr/local/Cellar/openssl/1.0.2j/lib"

Christophe Fergeau and I discussed to see if there is a better way to do this. For example, the -ObjC option derives from an error building vncdisplaykeymap.c without it:

In file included from vncdisplaykeymap.c:95:
In file included from /usr/local/Cellar/gtk+3/3.22.7/include/gtk-3.0/gdk/gdkquartz.h:23:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/AppKit.framework/Headers/AppKit.h:10:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/Foundation.framework/Headers/Foundation.h:8:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObjCRuntime.h:492:1: error:
      expected identifier or '('
@class NSString, Protocol;
^

Arguably, that can be interpreted either as:

  • an issue with using a .c extension for code that depends on Objective-C in vncdisplaykeymap.c. So maybe the Cocoa-dependent code in the macOS version should be placed in some vncdisplaykeymap.m file.
  • or an issue with the .pc files for gtk. If the intent is that file <gdk/gdkquartz.h> can be included from a C source code, then that should be reflected by pkg-config --cflags:
    % pkg-config --cflags --libs gtk+-3.0
    

    -D_REENTRANT -I/usr/local/Cellar/gtk+3/3.22.7/include/gtk-3.0 -I/usr/local/Cellar/glib/2.52.2/include/gio-unix-2.0/ -I/usr/local/Cellar/cairo/1.14.8/include/cairo -I/usr/local/Cellar/libepoxy/1.3.1/include -I/usr/local/Cellar/pango/1.40.3/include/pango-1.0 -I/usr/local/Cellar/harfbuzz/1.4.1/include/harfbuzz -I/usr/local/Cellar/pango/1.40.3/include/pango-1.0 -I/usr/local/Cellar/atk/2.22.0/include/atk-1.0 -I/usr/local/Cellar/cairo/1.14.8/include/cairo -I/usr/local/Cellar/pixman/0.34.0/include/pixman-1 -I/usr/local/Cellar/fontconfig/2.12.1_2/include -I/usr/local/opt/freetype/include/freetype2 -I/usr/local/Cellar/libpng/1.6.29/include/libpng16 -I/usr/local/Cellar/gdk-pixbuf/2.36.6/include/gdk-pixbuf-2.0 -I/usr/local/Cellar/libpng/1.6.29/include/libpng16 -I/usr/local/Cellar/glib/2.52.2/include/glib-2.0 -I/usr/local/Cellar/glib/2.52.2/lib/glib-2.0/include -I/usr/local/opt/gettext/include -I/usr/local/Cellar/pcre/8.40/include -L/usr/local/Cellar/gtk+3/3.22.7/lib -L/usr/local/Cellar/pango/1.40.3/lib -L/usr/local/Cellar/atk/2.22.0/lib -L/usr/local/Cellar/cairo/1.14.8/lib -L/usr/local/Cellar/gdk-pixbuf/2.36.6/lib -L/usr/local/Cellar/glib/2.52.2/lib -L/usr/local/opt/gettext/lib -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lintl -Wl,-framework -Wl,CoreFoundation

Christophe will ask the maintainers of gtk-osx.

Flight recorderDaily

Monday, June 19, 2017

nanotoconf

I finally fixed issues in 'build' with Windows file names. Also made some more minor changes, udpated the documentation, and published that over the week-end. This means that 'build' now has some support for auto-configuration in a manner similar to autoconf. I'm tempted to rename the project as nanotoconf

That got me a few comments, including a request to add cross-compilation. This is not complicated.

But I'm surprised to see that cross-compilers do not select the correct include paths and libraries without a --sysroot option:

build> cat test.c
#include 
int main()
{
    printf("Hello worldn");
}
build> arm-linux-gnu-gcc test.c -o /tmp/test
test.c:1:10: fatal error: stdio.h: No such file or directory
 #include 
          ^~~~~~~~~
compilation terminated.

Why do you need to specify the -I option, e.g.:

build> arm-linux-gnu-gcc -I /usr/arm-linux-gnu/include -c test.c

What is the point of having a cross-compiler wrapper if you still have to manually search for include files and libraries?

Spice

More discussion on the why and how of a flight recorder, and its applications for Spice.

Flight recorder

Still looking for a good way to export performance-like data. The fact that the flight recorder uses a printf-like format is a bit problematic for that objective.

Clearly, I don't want to have multiple flight recorder instrumentation probes. I would like the same instrumentation probe to provide data for all the use cases.

On this topic, interesting talk about tracing in QEMU:

Coworking day in SophiaDaily

Thursday, June 15, 2017

Coworking day in Sophia Antipolis, at the "Business pole". Not expecting a very productive day, but it's more fun.

Mail flush

Flushing emails, most notably backlog on code reviews.

Mac Reboot

Running out of disk space regularly. Apparenlty, that's a result of having some Spicy running too long that ate enough memory to eat into my disk space. MacOS seems to be more resilient to this situation than Linux, at least showing some nice error messages that explains more or less what is going wrong. But it still does not recover from this condition too well.

Trace selection

Began sorting and classifying the SPICE_DEBUG instances, while thinking about the comments that were made about the approach.

Recorder

Added automated conversion for floating-point values using the _Generic keyword added in C11.

I noticed that the following code does not seem to be acceptable:

int foo(float);
const char *strsignal(int);

#define MACRO(arg) _Generic(arg, const char *: 0, char *: 1, float: foo(arg))

int test() { int x = MACRO(strsignal(3)); }

Apparently, even values that are not evaluated, such as foo(arg), must still compile even if it is not used for the type being considered. As a result, I had to rework the code to return a different, always valid function name, and pass arg every time:

#define RECORDER_ARG(_1,_2,_3,_4, arg,...)
    _Generic(arg,
             unsigned char:     _recorder_unsigned,
             unsigned short:    _recorder_unsigned,
             unsigned:          _recorder_unsigned,
             unsigned long:     _recorder_unsigned,
             unsigned long long:_recorder_unsigned,
             signed char:       _recorder_signed,
             signed short:      _recorder_signed,
             signed:            _recorder_signed,
             signed long:       _recorder_signed,
             signed long long:  _recorder_signed,
             default:           _recorder_pointer,
             float:             _recorder_float,
             double:            _recorder_double) (arg)

This is somewhat less efficient, as that requires going through an level of indirection even if what I'm doing is a simple cast or even a no-op.

Reviewing C++11 codeDaily

Wednesday, June 14, 2017

I had the pleasure to review some C++11 code today, conversion from old C code. It was interesting to see a combination of very C-esque efficient code, e.g. use of strcmp, with at the same time for (auto &foo : bar) loops, or uses of final and override.

That made the review quite long, but quite interesting.

One more attempt to explain the things that I don't see in Spice and would like to see:

  • a logging system that knows how to record stuff and print it out efficiently.
  • a tracing system that lets me view selected things as they happen, including in hot paths
  • a flight-recorder recording information continuously and dumping useful data after specific events, e.g. crash, signal, assert, etc.
  • performance probes, to continuously measure key things and report them
  • a tuning / tweaking system, to adjust internal parameters, ideally on a running system, and observe the effect

The follow-on emails show that at least two key maintainers of the product see no benefit in this, and keep attributing that to me not wanting to use "the right tools".

Latency explorationDaily

Tuesday, June 13, 2017

Second day of Vincent's baccalaureat. Today is English.

Did some exploration of the performance impact of adjusting the point where we drop frames. I did not understand anything of what I was getting out of Spicy, until Frediano pointed out that he had added a "tweak" of his own in the server that was entirely throwing off the mmtime computation.

Finally understood the various components of the exploration.

Still more discussions about the tweaks and trace selection, trying to make sense of the "counter proposal". This is dragging on, not productive at all.

BaccalaureatDaily

Monday, June 12, 2017

Today is the first day of Vincent's baccalaureat. Spanish test.

Still answering discussions. This has been dragging on, painfully, since June 6. But now it's getting in a territory I don't like at all, that I find borderline unprofessional.

Valid opinion. Don’t like trace options, don’t have to use

them. But why prevent me from using them?

Cluttering the code because you can't use proper tools is not acceptable imho.

I saw that as an unwarranted ad-hominem attack. Responded, explictly asked the question that I am interested in:

  1. What categories do we want?
  2. How do spread the categorization work?
  3. How much do today’s developers depend on existing output?
  4. What categories do we request e.g. in bug reports (IOW, what should --spice-debug select)
  5. What tracing options and tweaks are potentially dangerous for end-users (e.g. what could expose security information)
  6. Consequently, what tracing options do hide in release builds, current POV ranging from “as many as possible" to “none of them”
  7. What are the existing hard-coded parameters? Explicit (e.g. 400ms delay in the server) or implicit (e.g. unbounded memory usage)
  8. Among these, which ones would we like to tweak?
  9. Among those we want to tweak, which ones would deserve dedicated options or env variables, which ones would be perfectly OK with a more generic option like the -t I proposed.

But this is really annoying, and wasting a lot of my time (and probably other people too). And my proposal prompted a counter proposal from one of the maintainers. Great!

Or rather, that would be great if the counter-proposal addressed the problem I want and need to solve.