Visit my new weblog: john.freml.in.
It's possible to upgrade to Xorg on Debian by changing your apt sources.list to include the Ubuntu repo. Note that this might mess up your setup. Anyway here's my sources.list
deb http://ftp.uk.debian.org/debian/ unstable main non-free contrib deb-src http://ftp.uk.debian.org/debian/ unstable main non-free contrib deb http://archive.ubuntu.com/ubuntu/ hoary main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ hoary main restricted universe multive # mplayer http://marillat.free.fr/ deb ftp://ftp.nerim.net/debian-marillat/ testing main deb ftp://ftp.nerim.net/debian-marillat/ unstable main
Unfortunately this gizmo seems to fail to conform(?) to the USB HID specification and needs a patch from Ben Collins usb-minmax.diff to twiddle the kernel into accepting it. The patch makes the kernel use the physical extents instead of the logical ones to find out where the joystick axis is (the logical ones are misreported by the device and result in the leftmost position being interpreted as 0, the rightmost as -1, and the centre as -128). It works fine with MS Windows XP (presumably MS Windows has this bug(?) built in).
Here are the details for my fine arcade like USB gamepad.
T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 8 Spd=1.5 MxCh= 0 D: Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=073e ProdID=0301 Rev= 1.00 S: Manufacturer=NEC S: Product=USB Game Pad C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 50mA I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=hid E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=10ms
Updated the colour of this website.
Even if X11Forwarding is enabled in /etc/ssh/sshd_config and the X libraries are installed, the xauth program might be missing from your Debian system, so X11 Forwarding cannot work. The solution is to install xauth which is in xbase-clients.
Booting from CD is impossible for some reason, there is a rumour that the blocksize for the CD drive is too large for the firmware to handle.
It is possible to install debian over the serial port without modifying the floppy images. If no keyboard is plugged in the SRM bootloader automatically switches to use the first serial port at 9600 baud 8N1. aboot, the Linux bootloader on the floppy, follows suit. If boot_osflags are set to "a" or empty (in fact anything other than "0") then aboot will go to its command prompt. Here enter "l" to see the arguments for booting the default kernel. Boot a kernel with those arguments but also add "console=ttyS0,9600 prompt_ramdisk=0". The prompt_ramdisk thing is needed because for some reason the kernel does not take input from the serial port when it asks you to "press ENTER" when the ramdisk floppy is ready.
As soon as control jumps to the kernel switch floppies quickly :-) and everything should work out fine
Patched yafc so that it has an rcfile option not to keep inserting ... while waiting to download filename completions from the ftp server. (The dots insertion mean it works badly with my readline patch.)
The Microsoft Intellimouse Optical mouse has a very bright light that can be very irritating if you're trying to watch a video or sleep in the same room as the mouse. In fact the mouse has two LEDs - one for it's lighting up the terrain for the optical sensor and the other apparently purely for a decorative red glare. It's easy to modify the mouse and remove the decorative one.
If you peel up the two little elliptical pads one the base of the mouse at the end furthest from the cable, you will reveal two small Phillips screws. Unscrew these and you can remove the outer shell. Now unclip the wheel and you can unclip the PCB, which has a catch holding it to the red plastic base.
Now the decorative LED at the back of the mouse will be obvious. I simply pulled it off taking very little care not to damage the PCB ;-) and after reassembly the mouse now works fine without illuminating my living area.
Reap the zombies being made all the time by this fine program.
Fixed sitecopy to work with NcFTPd Server (licensed copy) on FTP1.KONTENT.De - very dodgy code in this fine program.
Updated bash nostat patch to bash 2.05a.
Updated readline patch to readline 4.2a and made it work with yafc (which likes to add ... in while waiting for autocomplete).
Found and fixed a bug in ide-scsi wrt to this command and CDROM_LEADOUT.
Finished updating cdd to use track index format instead of MSF. It can now find the end of the CD and therefore works on my fine LG CD-writer.
Doug has come up with a version that finally works ok! Long live RedHat and so on, there was about three weeks worth of support hassle crammed into one and a half days thanks to Doug's sterling efforts.
Produced squads of debug traces for Doug Ledford, who is playing with the i810_audio driver and very sportingly helping get SiS 7012 sound working. The main trouble is DMA write overruns.
Made a patch to QuakeWorld current that allows me to use the extra buttons on my Microsoft Intellimouse Optical, and submitted it.
startx from XFree86-4.1.0 seems to be vulnerable to people snooping command lines - it displays MIT cookies there.
XEmacs and GNUS do not like LC_NUMERIC at all. In fact GNUS complains whenever you try to read an article, saying something like "unable to split window (size 0 too small)" and not showing the article.
Found a bug in getloadavg similar to that in procps from long ago. Patch submitted.
The fremlin.com name was set to expire today. However, it does not officially expire until the whois database from whois.internic.net is updated with it missing. This database is apparently updated irregularly, every 1 or 3 days, at about 0200 hours New York (EDT) time. Presumably the database merge takes about 2 hours. Luckily (or so I thought) the database was updated today.
Unfortunately the current registrar of fremlin.com is networksolutions, and their database was not updated before the merge. Therefore fremlin.com is still in existance until whois.networksolutions.com forgets it and whois.internic.net merges the update.
Running badblocks on a large partition tends to result in block numbers which are according to e2fsck out of range. The reason is that badblocks does not take the block size from the filesystem, so the block numbers have to be adjusted. For example if the fs has 4096 byte blocks and badblocks was working at its defaults of 1024 bytes per block, the block numbers from badblocks should be divided by 4.
See 17 February 2001 for the background on this. There was a change in 2.4.0 - 2.4.1 that meant CDS_NO_DISC was returned from ide-cd cdinfo in linux kernel, when should return CDS_TRAY_OPEN. This change (I just noticed) also broke setting the cd drive to auto open and close, because internal cdrom.c code expects CDS_NO_DISC when there is no disc. Patch submitted.
An easy mathematical question in the last word today. Sent in an answer in the hope of the £25 cheque. Also checked out the other unanswered questions and discovered why they were unanswered ;-)
The frag.com people are down, and the 1.10 version seems to be only available for BeOS. Eventually I located the 1.01 revision (Eraser101-linux.tar.gz) on http://games.mark-itt.ru/dl/q2patch/. Strangely enough Quake2 (Quake II) is faster with the voodoo2 than the Quake I QuakeForge client. (The old glqwcl.3dfx client segfaults for no reason whenever a level is displayed.)
The little fellows are rather easy to beat on standard settings ;-)
Made a patch against Python 2.1 but had trouble submitting it due to broekn interactions between SourceForge and lynx.
Fixed all bugs outstanding and added option to exec a script when events hit.
I don't like seeing any headers at all, so I put in my .emacs
(setq gnus-ignored-headers "[\n.]*" gnus-visible-headers 'nil)
But the trouble with this is that if you reply to a message without headers, the first paragraph of the message doesn't get yanked. Further, the message-goto-body command doesn't work: it skips the first paragraph.
The first problem has been bugging me for ages. I spent a few hours today figuring out where it went wrong and fixing it (very small patch of course).
glibc CVS seems to be broken wrt symbol versioning (afaics, perhaps it's my binutils). Therefore I reverted to glibc 2.2.1 - and nothing would start any more, complaining that the GLIBC 2.2.3 version was not provided. This was a caching problem: the old locations for the CVS glibc were stored in the ldconfig cache, so those libs were used in preference to the 2.2.1 ones.
Finally received my evaluation edition, a nasty MS-Windows CD. I tried to extract some stuff with cabextract but cabextract didn't recognise the files (and didn't say so). I submitted a patch to say that it doesn't recognise the input files in the hope that the maintainer will see his way to helping me ;-)
The boxevent patch caused my XFree86 background bouboule (from xscreensaver) to run slowly. Initially I suspected that some of my changes to kapmd where causing it to eat CPU - top showed mostly idle time, and kapmd eats CPU in idle time. After a lot of head scratching I traced the problem down to the misc_register call. After more head scrating, by a process of elimination I deduced that it must be something in userspace doing the damage.
And it was! XFree86, in fact, with a mangled APM driver, which you can turn off by adding
Option "NoPM" "True"
to the serverflags section of your XF86Config.
Some nasty hardlockups trying to run the binary only quake 3dfx miniport. Finally got a CVS glide2x working and stuck it on the glide project FTP space.
Found a stupid locking bug: I took the read lock when I should probably have taken a write lock. Still don't know what the lock ordering issues that Andi Kleen was talking about are, I wish I had a SMP box or two to test this stuff on ;-)
Changed the name of the pmevent patch to boxevent and put it in the drivers/char directory where it belongs. Hacked the APM driver wildly, killing a whole bunch of stuff. Also modified a bunch of other PM systems to use the boxevent kit, and sent of a patch, which bounced from most mailing lists because it had so many recipients :-)
Pain everywhere trying to get Glide CVS to work: the pcilib is broken therein it seems. I reverted it to an older version and more things started working, though I got a nasty hard hang when trying to play quake.
James Henstridge pointed out that I in fact didn't need gdk_init or gdk_rgb_init to get gnome-python to work, so a small amount of coding effort was wasted. The gdk-pixbuf load issue is a known problem and requires that the library be linked with libtool, according to James.
Wrote a python wrapper module for SysV IPC message queues. I don't know why such a thing is not included with the Python base distribution.
Cleanups everywhere in preparation for real code writing. The config and link system is really nice - if only those damn undefined references to unimplemented functions didn't screw things up ;-)
Updated my local setup to use nscd, the name service caching daemon, in place of bind 9. It is faster, but, unfortunately, probably has a vast number of security holes. As a result, I flushed out a number of problems in my ppp-scripts.
Further when resolv.conf changes (for example, because you connected to a different ISP) nscd never becomes aware of it. This is a general problem with all applications using the libc resolver interface.
It could be fixed in glibc by adding a field to struct __res_state with the last timestamp of the /etc/resolv.conf file, which would be checked with stat every gethostby* call. This would cause an extra syscall of overhead.
And put it here.
Prepared to update pikpiker to use gnome-python with the new gdk-pixbuf bindings. I wanted to use gdk without gtk and some necessary functions were not wrapped (gdk_window_clear, gdk_window_set_back_pixmap, gdk_rgb_init, gdk_init) so I sent in a patch for them to James Henstridge.
An annoying problem: unless I LD_PRELOAD the libgdk_pixbuf.so I get an error when I use gdkpixbuf.new_from_file:
python: error while loading shared libraries: /packages/i586-pc-linux-gnu/gdk-pixbuf-0.10.1/lib/gdk-pixbuf/loaders/libpixbufloader-jpeg.so: undefined symbol: gdk_pixbuf_new_from_data
I sent James a bugreport about this. I wonder why the libgdk_pixbuf.so is not being loaded.
Borrowed a Voodoo2 graphics accelerator from Spod. Some difficulty locating the latest glide for this card. Daryll Strauss is credited with writing it. His latest version (with full source) is at sourceforge but does not seem to have received much attention, and is only available by cvs. I offered to make an official sourceforge release.
Fixed up a bug in my procmail filtering scheme (specifically the bit removing the garbage that some ill-mannerly mailing lists decide to inflict on their users). I was missing a \\ or two.
Karsten Festtag has been helpfully sending me versions of his SANE backend supposed to work with my scanner. I decided to look at the problem today and changed a bunch of stuff in the kernel driver (most usefully fixing the debug messages up). In the end the trouble was that Karsten's driver claimed to be asking for a certain data transfer when in fact it asked for a larger one, which, combined with scatter gather, resulted in a wedged scanner.
Sent Oliver the updated kernel driver and Karsten a one line patch ;-)
A security hole I've kept meaning to fix. Lucky nobody noticed.
Andi Kleen writes back to say there are locking issues with my dynip patch. Some work to do there then.
Tidied up APM driver changes in pmevent patch. Broadcast widely.
Many many changes. And penguinpowered.com seems to be down again so I signed up for a redirector from DHS.
Wrote generic power management event interface for the linux kernel.
Discussions with David Brownlee.
Sent to Andi Kleen. Hope the fellow replies, this patch has been dragging on for a while. It allows userspace to declare an interface invalid which is invaluable with a dial on demand link that shuts off by itself.
Much has happened in the intervening month or so. Today I decided to finally update the webpages.
I set up a cross compiler from i586-pc-linux-gnu to i386-pc-msdosdjgpp using the GCC CVS of today. There was quite a bit of hassle and some problems with C++ remain.
Linking fails on cpp0 because target_flags (used in the djgpp.h config file) are defined in rtlanal.c which is not included in the link. This can be fixed by also defining target_flags in cppinit.c, though that probably isn't the right, as, as far as I can see, they are not modified anywhere.
The djcxr203.zip includes are faulty in that the gcc libiberty/getcwd.c bombed out because of undefined PATH_MAX. This can be fixed by #including "limits.h" (careful: with <limits.h> the GCC version is included instead of the DJGPP version) in unistd.h.
Updated to not annoyingly expand $HOME/.emacs to $HOME/home/john/.emacs.
Tried to do a scan and met with dismal failure. Seeing as I wrote the initial kernel support for this scanner I blamed my problems on that for a while, but it appears as though it was all SANE's fault.
Scanimage wouldn't give out any error message, it would just produce output like
P6 # SANE data follows 612 842 255
with no following data. There were in fact two problems with SANE backends 1.0.4. Some clown had set a timeout to 10 seconds that should have been much longer (it used to be 10 minutes). I sent a patch to correct this.
The more insiduous problem was in the READ IMAGE STATUS SCSI command sent to the scanner, from scsi_read_image_status in microtek2.c. The scanimage log looked like this
[microtek2] scsi_read_image_status: ms=0x8051710 [readimagestatus] 28008300600000000000 [sanei_scsi] scsi_req_enter: entered 0x8062560 [sanei_scsi] sanei_scsi.issue: 0x8062560 [sanei_scsi] scsi_req_enter: queue_used: 1, queue_max: 1 [sanei_scsi] sanei_scsi_req_wait: waiting for 0x8062560 [sanei_scsi] sanei_scsi.issue: 0x8062560 [sanei_scsi] sanei_scsi_req_wait: read 64 bytes [sanei_scsi] sanei_scsi_req_wait: SCSI command complained: No such file or directory
The scanner got wedged because of the READ command with zero data length and refused to respond to anything anymore, so it got timedout. We weren't given the specs for the USB interface so we don't know how to reset the scanner, and the kernel driver wasn't able to do this.
The solution is to comment out the READ IMAGE STATUS call or to use a READ IMAGE STATUS call with a data length of 1. There are more complications in going down this route, as I explained to sane-devel.
Came across a presentation by Apple describing how they were going to update from MacOS to FreeBSD. Looks like their bad design decisions are repercussing. It seems, they started out fine back in the day, with each application encapsulated in a single resource fork, but things seem to have deteriorated since then as apps began to install "system extensions" - might as well be MS Windows or something.
The problem of integrating resource forks with UNIX FS semantics they did not solve nicely either, I think a new syscall or two would be cleaner - otherwise you have nasty clashes when directories can have resources (as in UDF). The problem of copying files they can control completely because they distribute the command line and graphical interface file managers so I guess they were just lazy ;-)
In 2.4.0 - 2.4.1 detecting whether my DVD drive is open stopped broken. The CDROM_DRIVE_STATUS incorrectly returns CDS_NO_DISC. It should return CDS_TRAY_OPEN. This is presumably due to a change by Michael D Johnson who babbled about "Mt Fuji extended media tray reports" in a comment.
Release 0.2.4 with fork throttling, better debug messages, better bounce messages and bugfixes. Can now inject unmodified text into an SMTP server (--no-parse option).
Fixed the bug introduced by the last bugfix and cleaned up the readline code slightly by introducing rl_replace_line. Now available.
When bash tries to complete a command, it searches through the directories in $PATH (using getdents(2) which is very fast) then stats each entry twice (itself inefficient - one stat to find out whether the filesystem object is executable, another to see if it's a directory). All the binaries in my path are symlinks. The result is that it can take a noticeable time to complete a name, because each stat of a symlink requires quite a bit of IO. I profiled bash with strace and found that it was spending nearly a second on my K6-2 300 in stat(2) when completing a single command (starting from a flushed kernel cache obviously).
I made a patch against bash 2.04 that introduces a new shopt shell option, called "stat_all_in_path" which is set by default. When it is unset, bash will not try to stat entries in the PATH at all. This results in noticably better interactive performance on my setup, at the expense of correctness.
Sent in the changes necessary to get cpio 2.4.2 to build with glibc 2.2 and gcc 2.95.2. A bunch of violations of GNU standards, where stuff was declared that shouldn't have been.
Fixed the last remaining bug - where grabbing a history entry would result in a completer being placed at the start rather than at the end of the line. Available here.
Sent a patch to allow using devfsd just to set up compatibility device names to Richard Gooch, who is not feeling well at the moment. Migrated personal distribution completely over to devfs.
Got a reply from the maintainer Jamie Zawinski who opined that webcollage had a hard to follow flow of control that was not helped by my patch which neglected to comment anything. I agreed with him. Further I pointed out the nasty little security flaw in it - the filenames for temporary files it chooses are based only on its PID, so they're easy to guess in advance and setup a symlink to one of the webcollage process owner's files, and have it overwritten. If you exploit this, send me an email or something ;-)
My jolly old lexer kept faulting out on terminate() due to an "uncaught exception". Naturally, I thought I myself responsible and ransacked my code for dodgy throw() specifications etc. Nothing. I bunged in a few catch(...) handlers and discovered nothing. I changed compilers to gcc-2.97 of today and still got the same problem. I then figured it was something I didn't understand about exceptions and perused Stroustrup and the draft spec to no avail.
I turned to my debugger and traced the __throw going down the stack. The clowns writing ANTLR had forgotten to watch out for throwing an exception in the Tracer class destructor while there was an exception in progress. One reason Java doesn't have destructors I guess, though it would mean that doing something like Tracer is significantly more complicated.
My local webcollage is heavily updated - it will fill a screen much faster than the old version, because for every search it follows a few of the hits, and for each hit it takes a few pictures. Added northernlight.com search engine. I sent the thing on to the maintainer.
Changed my address to email@example.com to cheer myself up.
Got severely bitten by link order problems. Lots of pain.
I hate java. There is however a useful LL(k) parser (and LL(k) lexer!) generator written in it called ANTLR. Getting a JVM that works with ANTLR has been a near constant source of frustration to me. It's not like ANTLR does anything complicated either - no threads, no graphics, no sound, just basic IO. A year or so ago the blackdown JDK would mysteriously (but deterministically) eat single characters at odd places in the output. So I switched to kaffe. Today I tried to generate my lexer, but it didn't work. It complained about an unexpected character ÿ. This confused me greatly, and I fiddled with my lexer.g to no avail. After a lot of futile fooling about (with character set declarations etc.), I bunged in an empty lexer.g and got the same error. What could be wrong? I had run it previously with kaffe. I tried with the old libc 2.1. No dice. I tried it with the old kernel 2.2, similarly to no avail.
I turned to the ANTLR source. After a lot of chasing around, I discovered the dubious function for reading data was java.io.FileInputStream.read(). This is implemented as a native method in Kaffe as libraries/clib/io/FileInputStream.c using a KREAD macro that dynamically looks up the correct syscall wrapper for read(2). Jeez. The lookup generally resolves to a function in kaffe/kaffevm/systems/unix-jthreads/jthread.c. I straced the process and found it was dying after a 1-byte read returning 0 (meaning EOF). In this case, after going through several contortions, a value of -1 would be returned to the Java code. The Sun documentation says this is indeed the correct behaviour - why then was ANTLR behaving strangely? The only reasonable conclusion was that it was being miscompiled in some way. I tried the Kaffe VM on the original distributed bytecode and it worked fine. Therefore Kaffe must have been miscompiling the source code to byte code.
Kaffe 1.0.6 doesn't work (last released version), but the CVS of today does (with a small tweak to get it to build). Why do I dislike Java?
When the IP address of an interface changes, TCP connections with the old source address are useless. Applications are not notified of this and time out ordinarily, just as if nothing had happened. This is behaviour isn't very helpful when you have a dynamic IP and know you're probably not going to get the old one back. In that case, you want processes to get errors when they try to use one of the dead connections, so they can handle the disconnect more cleanly. Otherwise fetchmail, etc. can just hang waiting for ages. Andi Kleen implemented this functionality with a per interface flag in 2.2. See the iff-dynamic patch series.
This solution is not ideal because it adds some unnecessary state for the kernel to track, and userspace can't control precisely when the connections are marked dead. I made a patch for a new ioctl called SIOCKILLADDR, which marks finished all the IPv4 sockets with the specified source address. I hacked pppd to use the functionality (with a new killoldaddr option) and sent off my first network layer patch to a large number of mailing lists in the hope that it won't be completely ignored.
Fooled around with webcollage by Jamie Zawinski from xscreensaver. Cute little program, if a bit thrashy due to the dubious perl GC. It uses the pnm tools and xloadimage to display a funky changing pastiche of images hauled at random from the web. It does this by searching for random words on random search engines, picking a random link and following it.
Normally it picks it's words from the system word-list, but I think that using dict/words is a bit dull and not very 2k+1. I got my best session using the newsgroups "active" file (names of all newsgroups) with tr . \\n and boring news related words like "alt" or "comp" swept out. It gave very current net images - e.g. OSDN logos (and stayed free of smutty pictures).
I added support for google.com and PNG images, as well as miscellaneous hacks. The lower bits of rand are allegedly generally less random than the higher order bits according to Stroustrup. I don't know how perl does rand, but it seems to me a better idea to do int(rand() * (small integer+1)) instead of rand(big integer) % small integer, so I made the necessary changes.
I asked to sit the M4 Edexcel Maths module a long time ago. The blighter in charge of sending off this application is a fellow called Mr Chester. He gave me to understand that the jolly old exam board just moves a bit slowly, yes the thing was being processed but at its own pace. Then I asked him yesterday if the paper had come through yet. He said, no, it would be faxed through today. Surprise, surprise it wasn't - so I didn't get to sit the module and so my cunning ploy to have two goes at the further maths A level is in shreads. What a pain.
Found that Andy Henroid, the former maintainer of ACPI has disappeared, and a fellow from Intel called Andrew Grover has taken over, which would explain why my ACPI queries were not answered. I resent a fix for acpid's use of unbounded sprintfs on buffers (which were on the heap, luckily).
When LC_ALL is set to most european locales, a comma (,) is used instead of a dot (.) as a decimal point. This means that sscanf does not function correctly on numbers using the . convention, and top dies saying "bad data in /proc/uptime" to stderr, but resets the terminal to some annoying values so it's difficult to read it.
I wrote a patch against procps-2.0.7 that tries to set the locale to C for the scanfs then restore it, so that numbers get read correctly but also printed in the user's locale.
I submitted this patch to the maintainers of procps (RedHat) on 29 November 2000. They ignored me. Then this issue came up on linux-kernel and I reposted my patch. It got noticed by Albert D. Cahalan who maintains an alternative procps. Apparently, RedHat fixed the problem in their SRPM (by reimplementing scanf), but never fed the changes back into the tarball. Further it seems that the stock procps has some boundary segfault error conditions, but I couldn't trigger them.
Resent my patch to setsid(8) for util-linux. When you run a command from a shell with job control, it forks off all the programs in the pipeline and sets their program group ids to the process id of the first process in the pipeline. This means this process is program group leader. It therefore is forbidden to setsid(2) to get a new session id. Therefore setsid(8) doesn't work. My patch fixes this. Andries says it's in the queue.
Finally sent in solutions to 2000.5 and 2000.6.
Not even so good as last year. Just realised for two problems, I found necessary and sufficient conditions then forgot to iterate the actual answers. Oh well.
Version 0.1.8 released with a shutdown -p now program. Also wrote an email to Richard Gooch about it.
Finally release advanced off button policy patch. Thanks to Linus' post release code freeze (is it just to get the ReiserFS people or is there really no ulterior motive?) it probably won't get accepted by any kernels any time soon.
AAB + complicated roman numerals for two (?) STEP papers to read maths. Apparently the grades I get in General Studies and German don't count.
Went back on a crusade to get to be able to cleanly shutdown and turn off computer by pressing the power button. I should like to trap the system event caused when the on/off button is pressed on my ATX-based case. It can be configured by the BIOS (with APM enabled) to suspend the computer. It is also technically feasible to trap this event with APM on my computer, as it is sent to the OS. However, under Linux, there is no way (from userspace) to say that the APM (user suspend) event should be rejected. Stephen Rothwell posted a patch to to linux-kernel that (among other things) would allow that. However, it seems to have been ignored by Linus' creative patch filing system and is now languishing on Stephen's website. It won't apply to test13-pre4. Sent a message complaining about it to linux-kernel.
Next stop was ACPI. Had a quick dekko at the in kernel stuff - seems to be able to notify userspace about a power button press, super. I booted up with "apm=off" and the jolly old ACPI came on line. I dashed of to CVS the latest acpid (which itself could have been a bit of a challenge if I hadn't been lucky - remember to set CVS_CLIENT_PORT=22 and CVS_RSH=ssh). There was the traditional kernel header bungling - the current linux-2.4.0-test13-pre4/include/linux/acpi.h does compile with acpid because u64 is called __u64 to avoid polluting userspace's namespace. But anyway, the function declaration should be bunged in the #ifdef __KERNEL__ for obvious reasons. Patch submited to maintainer.
Finally got the daemon up and running. The nasty thing just ignored me, however hard I pressed the button. Typical. I checked that it should be logging things OK, and yes it should, except that the acpid_log function used to do it has a nasty buffer overflow vulnerability (admittedly it is of a variable on the heap, so a bit more ingenuity than usual is required to exploit it, but there is plenty of vital heap data - all those fixups used to implement ELF shared libraries for one thing). Mentioned this to maintainer as well :-). Hardware related userspace has to have elevated privileges but the people writing it are usually not that au fait with security issues - here it was just a case where vsnprintf should have been used as a pose to vsprintf.
Was the POWER_BUTTON function supported by my BIOS? Had a flick through the ACPI specs, until I hit upon the FACP (called in Linux the FADT apparently). The last few bits of this specify the supported features - if they're zero, the feature's supported (can't argue with that kind of logic, can you?). It would appear that both POWER_BUTTON (called ACPI_PWR_BUTTON in Linux) and SLEEP_BUTTON (also called ACPI_SLP_BUTTON) are both supported, which is strange as I have only one button, but ours not to reason why. That means that the thing should be activated in acpi_thread in linux/drivers/acpi/driver.c, but for some reason it has no effect. Sent a last desperate plea to the maintainer. It seems that the ACPI HOWTO has no help for me.
It would appear that the best course of action would be to try to hack the latest kernels to support the APM_IOC_REJECT ioctl used by apmd. Which appears to have been rather elementary. The apm driver (linux/arch/i386/kernel/apm.c) seems to go to well nigh extraordinary lengths to be friendly to userspace processes reading /dev/apmctl - for example, a process will receive all APM events on the queue when it does a read except the ones that it caused itself (a quite deliberate test is made to ensure this) and it tries to behave well when multiple processes send multiple suspend events - what a waste of time. IMHO the bally thing should be rewritten with a global state saying where you are at the moment, and a single reader like acpid, that gets to decide policy. In fact, emulating the acpid queue is probably the way to go. I mean to say, handling power management requests is hardly performance critical and so you can clamp down on as many locks as necessary - if you want more than one reader, let userspace handle it (like acpictl is going to do when it's finished). One point is that my APM BIOS seems to time out waiting for the OS to suspend the machine or reject the suspend event, so the userspace policy daemon should be realtime priority or something.
Moved umount filesystems etc. into a helper binary. If the original init binary is deleted, then it is impossible to umount the filesystem it is on while it is still running. The final fix for this problem requires that there be some way of passing init state between instances of the binary, as has been on the TODO for a while.
Fixed all the bugs I could find in my readline patch. Seems to have speeded it up extremely as well.
Spoke too soon. Changed locking convention - now signals are blocked except at a few points. Ready to go live!
No more visible bugs. Going to announce on freshmeat.net.
Purged my netscape bookmarks to update my list of useful links for A level German.
Finally, after nearly two weeks of labour, the world is presented with the world's most advanced init for workstations ever(tm). Most of the bugs seem to be fixed (honest).
Clear the tty and reset any complicated state set on it, like the redhat system.
Fixed strace-cvs build problems. Patch submitted.
Wrote up solutions to these two problems in the Crux Mathematicorum magazine.
Finally come to fix klogd problem. In sysklogd 1.4, the klogd does not work to log messages to the syslogd from 1.4. Other programs log messages correctly, so I suspected that klogd was incorrectly setup. Howver, I found that even syslog calls as the first code in main() were ignored. After a little investigation, I found that klogd from 1.4 does work with the syslogd from 1.3. After some futile fiddling with syslogd, I found that this problem is not due to syslogd either.
In fact, it was caused by a dodgy syslog.c linked with klogd, that overrode the version in the C library and used SOCK_STREAM instead of SOCK_DGRAM. The problem can be fixed by removing all references to syslog.o in the Makefile.
Strangely, it seems that the select() in syslogd kept being interrupted, without any signal being seen in the debugger. What causes this effect? It appears even when the debugger is not run.
Last modified: Sun Feb 27 17:03:06 GMT 2005