SPICE server crash analysisDaily

Wednesday, December 13, 2017

Working on the SPICE server crash analysis.

SPICE server crash

Checked that it crashes with any VM, so I don't have to kill my Win10 guest. I can crash before the boot menu in my Fedora27 guest, which much lowers the chances of modifying the disk and leaving it in a damaged / unbootable state.

macOS SPICE performance issue

Since the Win10 guest does not have the streaming agent at the moment, I can compare the performance of the macOS SPICE client, the macOS Microsoft Remote Desktop viewer and the Linux SPICE client. And indeed, I believe that David was right, there is a performance issue with the macOS client. The screen updates from left to right, very slowly. This is particularly visible during the update animations, which change the color of the whole screen.

Win10 guest on Turbo

Still trying to setup the Win10 VM on Windows. It feels sluggish, whether I use SPICE or Microsoft Remote Desktop to access it. I bumped it to 8 CPUs and 8G of memory, but still not feeling fast. I think that the network is somewhat problematic, but there are also tons of disk I/Os, as evidenced by the hard disk noises.

Unfortunately, I triggered the SPICE server crash by mistake on the Windows 10 VM while I was installing stuff in MSYS2 with pacman. (As an aside, finally a convenient package manager on Windows.) After that, it somehow refused to let me login. It was presenting my regular user name, but I was in the "Other user" box, and until I clicked on my user name, I could not login. Weird.

Pacman apparently recovered quite well from being interrupted, I just had to remove a lock file. Installed git, gcc, make, cmake, llvm (3.8.0svn).

Meanwhile, I'm also setting up a backup VM on Muse with similar setup, i.e. MSYS2 and development tools.

ssh for Windows 10

Activated ssh server on Windows 10. Typing ssh and ending with a Windows command prompt... Aaaargh!!!

Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:Usersdinec> C:Usersdinec>ls

Of cours, ls does not work

LLVM versions

Installed multiple variants of LLVM on Muse to see if it's possible to have multiple versions coexist on the same system. Apparently, that works. Muse now has the following:

Installed:
  llvm3.7-devel.x86_64 3.7.1-7.fc27
  llvm3.9-devel.x86_64 3.9.1-11.fc27
  llvm34-devel.x86_64 3.4.2-10.fc26
  llvm35-devel.x86_64 3.5.2-4.fc26
  llvm3.7.x86_64 3.7.1-7.fc27
  llvm3.7-libs.x86_64 3.7.1-7.fc27
  llvm3.9.x86_64 3.9.1-11.fc27
  llvm3.9-libs.x86_64 3.9.1-11.fc27
  llvm34.x86_64 3.4.2-10.fc26
  llvm34-libs.x86_64 3.4.2-10.fc26
  llvm35.x86_64 3.5.2-4.fc26
  llvm35-libs.x86_64 3.5.2-4.fc26

Also installed multiple variants of LLVM on Ptitpuce, using brew. I now have LLVM 3.7.1, 3.8.1, 3.9.1, 4.0.1, 5.0.0 and 6.0.0svn installed in parallel.

On the Win10 guest, I have 3.8.0svn. An old, but nonofficial build? It has the trailing svn, indicating it was taken "somewhere along the way" during the 3.8.0 development. That might be annoying, we'll see.

Having multiple variants like this will help me fine-tune my llvm-crap.h compatibility file, and hopefully understand a bit better how LLVM interacts with software such as Gallium LLVMpipe. Gallium is for later, though. For now, it's much easier to try to understand with code I already know, using the ELFE and XL compilers.

Resuming bug huntDaily

Tuesday, December 12, 2017

Spice server de-sync bug

Still getting this one after a few seconds, and this is "new" (saw it last week before leaving for Paris):

ddd@f25-turbo[version] spice-streaming-agent> DISPLAY=:1 spice-streaming-agent -c noblock=yes

spice-streaming-agent[2967]: UNKNOWN msg of type 5 spice-streaming-agent[2967]: BAD VERSION 0 (expected is 1) spice-streaming-agent[2967]: BAD VERSION 108 (expected is 1) spice-streaming-agent[2967]: BAD VERSION 97 (expected is 1) spice-streaming-agent[2967]: read command from device FAILED -- read 1 expected 8 spice-streaming-agent[2967]: FAILED to read command

Fixing networking in Linux VM

The Linux partition VM that I had repaired after the upgrade to High Sierra was still not functional. It had no network. After looking for the usual suspects in a VM, I traced it back to having no /etc/sysconfig/network-scripts/ifcfg-ens160 for my network card.

Why this file disappeared I don't know, but this seems to be a somewhat regular occurence. Recreated it as indicated in that thread, and everything is fine now.

Adding MSYS2 in Windows 10 VM

I added MSYS2 in my Windows 10 partition VM. What is annoying is that after installing it, I have only 5G left out of a 40G disk. I fondly remember a time where filling a 40MB disk took some effort. Now, a basic OS with just some development tools (Visual Studio + MSYS2) takes 35G.

At the same time, disk space on typical corporate laptops remains an issue. With a 500G disk, running a main OS and two VMs is a bit problematic. Of course, I can carry around a 3TB disk to have extra space, but if 3TB disks are available, why do beancounters insist on me having to buy it rather than have it INSIDE the laptop to start with?

So for now, I guess I will put data files remotely on the main partition.

Installing Windows 10 VM on Muse

Started installation of a Windows 10 VM on Muse. For some reason, the Windows10 VM on Turbo runs until the point where I connect to it with virt-manager, at which point it dies abruptly.

I suspect this is related to the de-sync issue I am seeing above, and that means my SPICE server library is not working correctly. Indeed, connecting with my homebuilt spicy client works (that one was actually built with c3d/build). It's annoying if version mismatches between client and server cause the guest to die.

Looking at the qemu log, I see:

/usr/bin/qemu-system-x86_64: symbol lookup error: /usr/local/lib/libspice-common.so: undefined symbol: celt051_mode_create
2017-12-12 17:43:27.214+0000: shutting down, reason=crashed

So the problem is really with the dynamic linking of libcelt051. Probably something wrong with the way c3d/build links libraries.

Of course, now the Windows10 VM on Turbo is complaining that it did not restart correctly, adn is showing all sorts of nasty messages in beautiful blue. The color palette at Microsoft improved, the messages did not.

Jenkins

A lot of this setup effort is related to my attempts to add some Windows and Fedora machines to my Jenkins setup.

Wordpress ate my homeworkDaily

Friday, December 8, 2017

Return from Paris. Started my day with regular email catch up and trying to get back to the issue I had seen on the streaming agent at the beginning of the week.

Then decided to "lighten up" a bit and writ a blog article about my trip in Paris. Bad idea. I wrote about 1000 words, and Wordpress kept showing me the "Saving/Saved" message, but when I published, I had zero words. Worse yet, the history showed only my initial draft, with text up to the first paragraph, and everything after that was empty.

It turns that there is apparenlty a bug in Wordpress that triggers if you paste an image (instead of importing it as a file). You end up with a fairly large chunk of data in the HTML view, apparently "too big" for Wordpress to accept as input. So it fails, but does not tell you.

That annoyed me quite a bit. I took a walk with the dog. At least, the dog does not eat my homework.

Otherwise, the trip in Paris was quite nice. Met some people from AdaCore, who reminded me that I have some pending writeups on the Alsys Ada commenting style. I'll try to finish during the week-end.

Happy BirthdayDaily

Tuesday, December 5, 2017

Entering my 50th year. Hard

Added some Monty Python dialogues to my Build presentation.

Fighting some issues with the streaming agent:

% DISPLAY=:1 spice-streaming-agent -c blocking=no
spice-streaming-agent[2390]: UNKNOWN msg of type 5
spice-streaming-agent[2390]: BAD VERSION 0 (expected is 1)
spice-streaming-agent[2390]: BAD VERSION 108 (expected is 1)
spice-streaming-agent[2390]: BAD VERSION 97 (expected is 1)
spice-streaming-agent[2390]: read command from device FAILED -- read 1 expected 8
spice-streaming-agent[2390]: FAILED to read command

Realized that after the latest High Sierra upgrade, while my partitions still boot as partitions, VMware is now complaining. I could easily restore the Win10 partition because it's some kind of standard BootCamp thing. But the Linux partition fails with "Your Boot Camp virtual disk configuration is out of date".

The trick from Sep 25 no longer works:

cd /Applications/VMware Fusion.app/Contents/Library
./vmware-rawdiskCreator create /dev/disk0 PhysicalLinuxPartitions 3,4 lsilogic
Unable to determine partition start sector(s).

(Compared to Sep 25, partition numbers have been changed to adjust to the numbers that diskutil list now shows:

% diskutil list
/dev/disk0 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *500.3 GB   disk0
   1:                        EFI EFI                     209.7 MB   disk0s1
   2:                 Apple_APFS Container disk1         404.7 GB   disk0s2
   3:                  Apple_HFS Linux EFI               200.3 MB   disk0s3
   4:           Linux Filesystem                         47.1 GB    disk0s4
   5:       Microsoft Basic Data BOOTCAMP                48.1 GB    disk0s5

Figured out a way to do it:

  1. Create another BootCamp partition

    # Copy partition information from the old physical map, but into the new physical disk, with the current offsets.

That works. End result looks like this:
# Extent description
RW 34 FLAT "Boot Camp-pt.vmdk" 0
RDONLY 6 FLAT "/dev/disk0" 34 partitionUUID @disk:diskModel=APPLE|20SSD|20SM0512G,diskSize=500277790720
RW 409600 ZERO
RW 790360024 ZERO
RW 391168 FLAT "/dev/disk0s3" 0 partitionUUID @partition:diskModel=APPLE|20SSD|20SM0512G,diskSize=500277790720,partSize=200278016,partOffset=404874067968,partMediaUUID=E2C92051-7C84-49A6-B681-02B855B959B1,partVolumeUUID=DF8BAE6C-B37D-36D7-A650-75A78232442B
RW 92088320 FLAT "/dev/disk0s4" 0 partitionUUID @partition:diskModel=APPLE|20SSD|20SM0512G,diskSize=500277790720,partSize=47149219840,partOffset=405074345984,partMediaUUID=E531F1DF-1E3B-426B-9D70-422079FDBE06
RW 93855744 ZERO
RW 131 ZERO
RW 33 FLAT "Boot Camp-pt.vmdk" 34

Problem is that I now have no network, either in the VM or when booting physically. lspci shows a controller, why doesn't it activate?

03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

For some reason, the network service was disabled, fixed with

# systemctl start network.service
  1. systemctl enable network.service

Rebuilding Spice client for macOS from scratchDaily

Friday, December 1, 2017

Here is a quick recipe to rebuild a macOS SPICE client from scratch. There are two methods, one using autotools, one using c3d/build. The autotools approach is presently complicated on macOS due to a bug in autotools and to differences between Apple and GNU tools.

Building with make and c3d/build

This approach is still experimental (as in "published today"), but it is significantly faster than autotools (as in about 16 times faster).

  1. Install Homebrew as follows:
    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    
  2. Install the required dependencies for building:
    brew install pkg-config gettext intltool pixman gtk+3 gtk-doc gstreamer gstreamermm libjpeg-turbo gst-plugins-good gst-plugins-bad gst-libav
    

    brew cask install xquartz

    Note that installing XQuartz is not necessary for the SPICE client, only for the SPICE streaming agent, but this ensures that all SPICE components build successfully. Also, this step is only used as a quick way to install the /usr/X11 directory. If you have that directory on your system, you can skip the brew cask install xquartz step.
  3. Clone the top-level directory for SPICE and go to that directory:
    git clone https://github.com/c3d/spice
    cd spice
    
  4. Tell pkg-config where to find packages installed by brew:
    export PKG_CONFIG_PATH=/usr/local/opt/jpeg-turbo/lib/pkgconfig:/usr/local/Cellar/openssl/1.0.2m/lib/pkgconfig:$PKG_CONFIG_PATH
    
    Make sure that you adjust the path for openssl to match what brew actually installed. The OpenSSL package is not seen by default by pkg-config because it conflicts with Apple's own versions.
  5. Run the actual build:
    make -j
    
  6. If there is no error, you should be able to run spicy:
    ./spicy -h some-host -p 5900
    
You can switch this top-level directory to autotools by running the ./autogen.sh script from the top-level, and restore to c3d/build by running make gitclean.

Building using autotools

The steps are somewhat similar, but there is an additional requirement to patch the current version of autotools. This is also more likely to run into relatively mysterious errors, so make sure to check the Possible errors section below if you have a problem.

  1. Install Homebrew as follows:
    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    
  2. Install the required dependencies for building:
    brew install pkg-config autoconf automake gettext intltool pixman gtk+3 gtk-doc gstreamer gstreamermm libjpeg-turbo gst-plugins-good gst-plugins-bad gst-libav
    

    brew cask install xquartz

  3. After verifying the version for OpenSSL installed by brew, setup the environment for building as follows:
       export PKG_CONFIG_PATH=/usr/local/opt/jpeg-turbo/lib/pkgconfig:/usr/local/Cellar/openssl/1.0.2m/lib/pkgconfig:$PKG_CONFIG_PATH
       export PATH=$PATH:/usr/local/Cellar/gettext/0.19.8.1/bin:/usr/local/Cellar/automake/1.15.1/bin
       export CFLAGS='-ObjC'
    
  4. Clone the spice-protocol repository and go there:
    cd ~/my-build-directory
    git clone git://anongit.freedesktop.org/spice/spice-protocol
    cd spice-protocol
    
  5. Run auto-configuration in spice-protocol (it may be a good time to take some coffee):
    ./autogen.sh
    
  6. Build and install spice-protocol:
    make install
    
  7. Clone the top-level directory for the SPICE GTK viewer, and go to that directory:
    cd ~/my-build-directory
    git clone git://anongit.freedesktop.org/spice/spice-gtk
    cd spice-gtk
    
  8. Run the auto-configuration for spice-gtk, and go grab another coffee:
    ./autogen.sh
    
  9. You may need to run configure manually, because the default configuration is

Possible errors

  • Dependency on Objective-C headers: The CFLAGS='-ObjC' is required because one of the keyboard mapping files includes a header that uses Objective-C syntax. Without it, you will get something like:
    In file included from vncdisplaykeymap.c:95:
    

    In file included from /usr/local/Cellar/gtk+3/3.22.24/include/gtk-3.0/gdk/gdkquartz.h:23: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/AppKit.framework/Headers/AppKit.h:10: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/Foundation.framework/Headers/Foundation.h:8: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObjCRuntime.h:506:9: error: unknown type name 'NSString'; did you mean 'GString'? typedef NSString * NSRunLoopMode NS_EXTENSIBLE_STRING_ENUM; ^

  • Autoconf bug in error reporting: While running auto-configuration, you may end up with a helpful error message that looks something like that:
    autoreconf: running: aclocal --force -I m4
    Use of uninitialized value $msg in concatenation (.) or string at /usr/local/Cellar/autoconf/2.69/bin/autom4te line 1026.
    Use of uninitialized value $stacktrace in pattern match (m//) at /usr/local/Cellar/autoconf/2.69/bin/autom4te line 1026.
    unknown channel m4trace: -1- AS_VAR_APPEND(ac_configure_args, " '$ac_arg'")
     at /usr/local/Cellar/autoconf/2.69/share/autoconf/Autom4te/Channels.pm line 638.
    	Autom4te::Channels::msg('m4trace: -1- AS_VAR_APPEND(ac_configure_args, " '$ac_arg'")x{a}', undef, 'warning: ', 'partial', 0) called at /usr/local/Cellar/autoconf/2.69/bin/autom4te line 1026
    aclocal: error: echo failed with exit status: 1
    autoreconf: aclocal failed with exit status: 1
    
    If you see this, then you need to patch autom4te as explained in this comment. You can for example run:
    sudo vi /usr/local/Cellar/autoconf/2.69/bin/autom4te
    
    then search for the message # Trace with arguments, and insert the following text just above it:
          # Traces without file/line
          next if (m{^m4trace: -(d+)- ([^(]+)((.*)$});
    
    In other words, you need to apply the following patch:
    diff --git a/usr/local/Cellar/autoconf/2.69/bin/autom4te.old b/usr/local/Cellar/autoconf/2.69/bin/autom4te
    --- a/usr/local/Cellar/autoconf/2.69/bin/autom4te.old
    +++ b/usr/local/Cellar/autoconf/2.69/bin/autom4te
    @@ -821,6 +821,8 @@ EOF
       my $traces = new Autom4te::XFile ("< " . open_quote ($tcache . $req->id));
       while ($_ = $traces->getline)
         {
    +      # Traces without file/line
    +      next if (m{^m4trace: -(d+)- ([^(]+)((.*)$});
           # Trace with arguments, as the example above.  We don't try
           # to match the trailing parenthesis as it might be on a
           # separate line.
    
    Once you have applied the patch, you will see the actual error that the autoconf bug was trying to report but failed to, for example:
    configure: error: Package requirements (spice-protocol >= 0.12.13) were not met:
    No package 'spice-protocol' found
    Consider adjusting the PKG_CONFIG_PATH environment variable if you
    installed software in a non-standard prefix.
    Alternatively, you may set the environment variables SPICE_PROTOCOL_CFLAGS
    and SPICE_PROTOCOL_LIBS to avoid the need to call pkg-config.
    See the pkg-config man page for more details.
    

  • Configuration errors. For some reason, autogen.sh may leave you with an invalid configuration if you don't run it manually. In that case, you get this:
    make install
    

    echo 0.34.23-0381-dirty > .version-t && mv .version-t .version /Library/Developer/CommandLineTools/usr/bin/make install-recursive Making install in spice-common make[2]: *** No rule to make target `install'. Stop. make[1]: *** [install-recursive] Error 1 make: *** [install] Error 2

    If you see this, simply run ./configure manually.
  • Forced errors for warnings: You may see an error like:
    channel-display-mjpeg.c:117:2: error: "You should consider building with
          libjpeg-turbo" [-Werror,-W#warnings]
    #warning "You should consider building with libjpeg-turbo"
     ^
    1 error generated.
    
    There were apparently recent changes in the Homebrew-installed jpeg-turbo, which make the version incompatible with the regular JPEG version. For now, simply remove the #warning.

In the end...

Let me conclude with a personal opinion...

On my machine, the complete build with c3d/build takes about 15 seconds for a debug build and about 30 seconds for an optimized build. With autotools, it takes roughly 3 minutes, that's about 16 times slower.

The description of what's required to build SPICE with c3d/build is presently about 800 lines of standard Makefile. By contrast, with autotools, it takes about 5x as much, 1303 lines for configure.ac files, 2428 lines of Makefile.am (granted, at present, building very slightly more, e.g. documentation)

Autotools: Complex non-solutions to simple non-problems.

As far as I know, there is nothing autotools makes that make cannot do as well and faster. QED.

Patches WednesdayDaily

Wednesday, November 22, 2017

Several reviews.

Patch "Report reason when there is an error loading the plugin" sent, acked, pushed.

Patch "Report initialization errors more precisely" sent.

Patch "Implement version checking for plugins without violating ODR" Matching plugin patch.

Autoconf/Build ComparisonDaily

Tuesday, November 21, 2017

Feeling sick today (stomach flu), so I spent a little time doing something not too hard on my brain, finished the 'build' Makefiles for spice-gtk, in preparation for the talk on 'build' I'm giving on December 7.

The makefile is a bit complicated, because there are several steps along the way, generated files, etc.

The build time results are promising (all numbers are times in seconds, best of 3):

Build (-j) Build Autoconf (-j) Autoconf
Clean build from git 25 45 95 143
Incremental build (log.h) 18.1 34.9 19.3 61.3
Incremental build (snd_codec.c) 1.9 1.9 3.0 3.9
Incremental build (spice-client.c) 1.7 1.7 3.3 3.8

The maintenance complexity is another aspect where there is a real difference:

Build Autoconf
Makefile 442 lines, 13K 939 linees, 26K
configure.ac (None) 710 lines, 24K
Total 442 lines, 13K 1649 lines, 50K