Houston, We Have a Coredump


During the initial 136-day shift aboard the International Space Station, Commander William Shepherd kept a mission log, which principally reads as a long list of the frustrations of life aboard an orbital bucket of bolts. I cannot describe how funny it is.

There are lots of gems: unexplained mechanical breakdowns, tools used in ways Ground Control never intended, odd DVD choices (Apocalypse Now?), and plenty of "redacted passages." But the best part has to be the computer problems. Anyone who's ever used a computer will sympathize, but as a software developer reading this stuff, I just want to hide my head in shame on behalf of my ignoble profession. Some highlights:


9 November: 'Laptop ergonomics' is an oxymoron.

We are out of laptop desks for the SSC's and Wiener. We have 3 and they are all deployed. (Just for background we have 9 laptops deployed and 1 or 2 more that we might want to use.) 2 more desks with bracket hardware would be handy. Until then, we are in a "make our own" mode, and intend to fabricate a substitute desk out of structural discards from Progress or a food container lid.


10 November: We know there's no good technical documentation on Earth, but there's apparently not any in space, either.

Finding the written instructions on cable and connectors very elaborate. This is the sort of thing where a good sketch would greatly help. Some of the English translation misses a bit of the Russian nuance. Russian instruction says look "in the vicinity of hermetic plate" and the English translation was "look on the plate". The connector Shep needs is, in fact on a wire bundle "in the vicinity" of the plate. (lose about 30 minutes sorting this one out). Total work on the job s about 9 man hours.


10 November: When Windows prompts you to create a boot disk, maybe you should.

We were configuring SSC 2 to run a CD when it decided to lock up. After repeated attempts to restart, Shep and Sergei went through a long attempt to extract files from the SSC's hard drive before reloading the SSC software. Used the startup disk in the onboard software suite, but could not find a particular file while hunting around with DOS. This would have been much easier with some bootable media (CD-ROM?) that could run Windows. (Or if Shep was not indoctrinated by that "other" operating system). We may need an emergency boot capability again. After 5+ attempts, finally got the hard drive to take an image off the ghost CD. One of the Autoloader floppies went down, but SSC 2 is now running normally. ( 3+ hours troubleshooting).


11 November: Uh, Expedition One, can you please hold while I escalate your issue to my supervisor?

Approx 1930 experienced a "crash" with the Russian PCS laptop. Attempting to reboot the PC gave indication that the Sun OS would not load. Boot s/w can not read root directory correctly. Even Sergei didn't understand this one. Talked with TsUP and decided to wait for specialist advice tomorrow.


14 November: Been spending too much time on Napster, apparently.

OCA file transfer problems in the afternoon. Did several reboots, cleaned off large avi files, dismissed unnecessary apps. OCA still appears to be running slow although lots of storage and RAM available. We are thinking to try again Wednesday with ethernet network card turned off, to reduce processing demand on OCA. Will wait for ground OK to do this.


16 November: Relaxen und watchen das blinkenlichten.

The only outstanding unit is the printer. We have about three blinking lights on the back panel of the printer but it's not on the net. We are out of ideas on how to troubleshoot this one and we need some more input from the ground on what to do next.


24 November: We're not sure why it works, but we're not asking any questions.

Reattempted the Winscat test by logging in as a new user. It worked, and seems to associate Shep with his previous data in database. Ground may want to look at this.


25 November: There's no such thing as a universal file format.

More on fonts. Major step forward. Sergei is on the Wiener checking out a CD with the Russian ODF on it and all the new Russian data file symbols can be read by "Word". Someone on the ground has created a file to interpret and display these symbols. We believe this could be added to the Windows environment on the SSC, so all Russian data file could move around the network as word documents vice having to be in the PDF format we are now using. We need to do this also.


27 November: Dual-booting can be hazardous to your health.

Sergei notices that the Russian PCS laptop has locked up. He tries to reboot, but the Sun application software won't load. Lots of messages on the screen noting data errors. Sergei thinks that it may be the hard drive. He boots up windows to see if the windows partition runs OK--it does. So at least some of the hardware is functional.


2 December: Dammit Jim, I'm an astronaut, not a sysadmin!

Everyone at a laptop to read mail and sort through the message traffic. We all are seeing some problems. Sergei moved all his mail to a personal folder, yet his ".ost" file is still over 1 Mb. Shep can't run outlook at all on the MEC configured with the SSC 2 hard disk. Yuri is having trouble doing mail in Russian. He needs help on fonts. We feel pretty much like a bunch of campers when it comes to mail-server problems. A little more of the nuts and bolts of how this all works would be useful to us and could help us work better with the ground during troubleshooting sessions


5 December: Strangely enough, the spammers found them immediately.

Sergei and Yuri still having some email problems. Apparently friends in Russia do not have the right email address to reach them.


19 December: Proof of artificial intelligence.

Ate some dinner and watched disk #2 of "Lethal Weapon 3" (It's Lethal Weapon Week) although the disk kept crashing about 10 minutes from the end.


20 December: Even in space, you can still surf the Web.

Some confusion on where the Z1 storage is listed. Database shows it as "outside" Node. IMS is running very sluggishly on SSC 2-don't know why, although some other applications were open (word, explorer).


22 December: Every hardware engineer is a pragmatist at heart.

Back into it with OCA. The procedure to connect up the data line calls for us to pull out 2 data lines which run aft the length of the FGB to the ??? and put their aft terminals next to connectors in the ?? so we can check continuity. We leave cables in place and jumper the pins on each end of the cable one at a time to ground and check continuity (and open circuit) that way. Probably saves us 1-2 hours. Also using the scopemeter for this kind of work is sort of overkill. A small test-light probe would do fine here.


23 December: When was the last time you took your network down to watch a DVD?

Reconfigured the Wiener for the DVD drive set up after dinner and tried one of the DVD movies. This is definitely the way to go. Video and sound quality much improved over the CD-ROM disks. Only down-side for us is the network has to come down when the Wiener gets configured for DVD, but we figure for Saturday night, it's worth it.


27 December: Ah, the Registry.

We got the SSC file server backed up per the daily plan. Finally have this configured so we don't have to change any hardware. Backup took a while. Kept getting messages that "registry" was full, although backup eventually completed itself. We believe that the server is trying to handle a lot of program transactions, and this is taking most of the computing power it has.


28 December: Better than just blinking "12:00," I suppose.

Around lunchtime, we missed another Earth Obs site, and we figure it could be for several reasons-Yuri's laptop is gaining a couple of minutes each day.


29 December: Asking people who refer to "memory space on the disk" to administer NT servers is asking for trouble.

We do the MPV update on the file server per the OCA note. MPV load does not seem to copy completely and server has a number of error messages. We are apparently out of memory space on the disk, although we're not sure exactly how NT manages its memory. Wait to talk to Houston. We discuss this later in the day, and then delete all the MPV files which frees up about 800 Mb. We also plug in one of the 1 Gb PC cards, so at least for the short term, the server has some more storage space. We would like to know a little more about the long term plan to manage storage on the server--we were kind of wondering when the hard drive was going to get full. Answered that question today.


3 January: Everyone hates PCMCIA cards.

Backed up the file server after lunch. Learned that when we pull the PC card now being used with the server for more memory, it loses the "sharing" property when it leaves and this has to be redone when it is reinserted. Backup was normal, and is going much faster now that we leave the server configured with the card extender. We just wish they were not so flimsy.


4 January: And press the "any" key to continue.

We did the time update on the SSC file server. When the server time was reset using the clock icon from the system tray, the "apply" button was used. (procedure did say use the "OK" button) This put the update program in some type of loop which we could not get out of. Had to shut down the server and restart. When the server came back up, correct time was resident. We don't know if this is a one-time anomaly or there is some problem here. Went ahead and broadcast the new time to the network.


8 January: The search for signs of an intelligent search engine.

The browser is working well, and this is a much more convenient system for finding things than doing it on paper. Searches on the messages onboard folder could be better, but it is workable. A keyword search feature here would be most useful.


8 January: Are you sure it's not set to use A4 paper?

Trying to print out the OCA messages about the IMS details. Printer is still acting up and printing half pages. We have been feeding it strange paper, (green) and wonder if that has offended it. We try the reboot technique sent up from the ground, and that seems to help for a while. Then the printer goes back to its old ways. We would still like to know how to change permissions so that the client SSC's can cancel print jobs in the print queue-we are still blocked from doing this.


15 January: I would have sworn it was right here.

We could not find one of the video cables for about a half hour until Sergei remembered that we had used it for the OCA hookup for TV from Progress.


16 January: What does a fellow have to do around here to get a decent headset cable?

We find fairly quickly that the "sleeve" wire does not connect between both ends of the cable on the leg to the microphone jack. We cut the wiring and do a check again, and isolate the problem to the end with the 1/8" plug, which fortunately, is right where we made the cut. Tried to take the plug apart, but this was all glued together. We pulled the wiring off a set of Sony CD speakers to get another jack that would fit the back of the laptop. We stripped and spliced the wire, which was very fine-24, maybe 26 gauge. We don't want to use our few butt splices for this, and the wires are too fine anyway, so we pull out the soldering iron to see how that works.

First problem is that we can't plug the iron in. Plugs are Mir-style, and apparently the sockets in the SM are different (more leads). So we do another IFM to hook the iron up. Then the little soldering tube on the end won't fit the iron-it's the wrong size. So out comes the Contingency Clamp Kit and we safety wire the tube to the end of the iron. It works. We tin the leads, put some tubing on them and insulate with electric tape. (The Russian side did provide this, and it does work.) We get the cable hooked up, and do a mike check. We have an OCA hookup with the Chief Astronaut and the system is working. That's the good news. Bad news is that we now have another failure-looks like the earphone cable is bad-maybe the same type of wire-connector failure. .Today's IFM took us maybe 3 1/2 hours. But the external laptop speakers are working and we have the OCA comm. link back.


18 January: There's nothing like a well-stocked supply room.

We would like to ask that 5A show up with enough gear so that we end up with at least the following spares, which will be above the 5A outfitting requirements:

  • 3 spare hard drives-3 GB or bigger
  • 3 spare network dongles and cards-we have no working spares left
  • 3 spare PC card extenders-we are finding that every laptop needs these
  • more spare coax-at least--
    • 2 X 25 feet
    • 4 X 10 feet or 4X 6 feet
    • 6 X 3 feet
    • 10 each of the T, Coupler, and Terminator connectors


19 January: That's what you get for using Outlook.

We are continuing to see some strange things on our email-particularly the sizes of files. We are trying to keep minimum messages in our inbox and outboxes, and we still see large ".ost" files moving to the ground-2.5 Mb each. We don't understand why these are so large.


19 January: All Ground Control Operators are currently busy helping other space stations, please stay on the line and your call will be answered in the order it was received.

Stepping through the diagnostic procedure does not go as planned. On the first step, loading protocol 11001, we get the exact same indications as before-"download complete" followed shortly by "error-missing axis". Ground notes to "bypass" this do not work. We can not get the control software to do anything else. "Enter" key just puts us back into the top of a loop with MACE asking for a new protocol. We relay this word to Houston and standby.


22 January: Let me guess: Outlook again.

Sergei is still having difficulties with his email. After the mail sync, he still has "outgoing" mail left instead of everything in the "sent" folder. We talk to Houston about this, as this has occurred a few times now with Sergei's files.


23 January: Uh, yeah, we're having trouble repro'ing this one in the lab; do you think I could come over there and debug it on your machine?

The file server is acting up. Transactions between the client laptops and the server have much more delay than we have seen previously. Yuri had similar delays last weekend, but we have not worked intensively with the IMS since then. Today we are all inputting transactions and the system seems like it just can't keep up. Calls for data on individual objects has been taking 3-5 seconds. Today it seems like even the routine data from the server has significant delay-sometimes taking up to 4 minutes to complete. "History" pages are particularly slow in downloading. After a while, Sergei reboots the file server, but this does not improve the situation.


24 January: And you thought dogfood email was bad.

We are still having unusual email problems. After a late afternoon mail sync, we all had "outbox" mail which had not been taken, and nothing new put in the "inbox", so we think some part of the ground sync and upload to us was incomplete. Our mail problems have definitely been more frequent in the past 3 days, particularly for Sergei. He is pretty sure mail has been mislaid and he needs a way to account for what has been sent to him in the last 2 weeks


25 January: Who the hell resolved this one "By Design?"

Shep starts the MACE troubleshooting in the Node. We step through the procedure and finally get the MACE to operate after a PC card is swapped out in the secondary drive. The protocol we tested with, 11001, looks OK. The first protocol on the priority list is run, but it does not look as though MACE is doing anything. We call this down to the ground and Houston says this is expected


9 January: "Just ship it. Nobody will ever need that many alarms."

We have been working with the Timex software. Many thanks to the folks who got this up to us. It seems we each have a different version of the datalink watch, and of course, the software is different with each. Yuri and Sergei are able to load up a day's worth of alarms, but Shep has the Datalink 150, and this has a 5 alarm limit. So 2/3 of the crew are now happy. All this is a pretty good argument for training like you are going to fly-we should have caught this one ourselves in our training work on the ground.


30 January: Contains fixes for the American, Far East, and Outer Space editions.

Shep and Yuri update the file server with service pack 7. No problems. We reboot the server and it runs well all day.


5 February: NASA: we put the "Universal" in "URL."

Apparently Houston's been having some trouble with the OCA file transfer. We already saw that mail and some of the execute package did not make it up. We already have the OCA reboot started when Houston asks. We are missing some operational messages, and we don't seem to have the .htm file which points to each day's execute package. We ask Houston about this too, and finally get it squared away a few hours later.


7 February: Do you understand how routers work?

The Wiener comes "up" this a.m. with a blank screen, although it is still processing and "routing". Sergei reboots it and it runs normally.


19 February: You put WHAT on the CD-ROMS?

We have been receiving CD ROM's from the ground via Shuttle which have been difficult to read. We've had problems with both operational software and entertainment discs. These occasionally have some foreign material on the surface which gets in the way of the disk reader hardware. The new Russian laptop software image was the most significant example. We think there is some glue residue left on the disks from sticky notes or labels.


21 February: I'm sorry, you're breaking up. Are you going behind some space junk?

Mid afternoon, we do an OCA media event and a spot for the Houston Rodeo. The OCA is reverting to its previous mischief, where we are simplex comm. Everytime we want to hear or speak, we have to toggle a button on the keyboard to switch this. We think this has got to be a software problem. Sergei comes up with a temporary fix where we swap the headset and speaker jacks, which gets us through the session today. Unfortunately, comm. on the uplink from the media rep was intermittently unreadable.


22 February: You work with computers, you have days like this.

The day really gets off to a bad start. The server connection to the net is down hard. We worked on it last night until 0100 and could not bring it up. We were doing the file server part of network reconfiguration yesterday. This moved the FS to the lab-we also extended the Ethernet lan from the Node into the lab (not part of the procedure). This allowed the server to rejoin the network without delay, rather than waiting much later when the RF access points are set up. The plan was working well, and the server was online from mid afternoon. At about 2200, we were reconfiguring some mail files which, with a lot of help from Windows NT, got put in the wrong place during the backup procedure. When we finished restoring the files, the network was down and would not come back up. We worked this for several hours. Finally, jiggling some cables brings just a part of the net back. (that really instills confidence in the stability of your network).


So as of 0700, we have to use the OCA machine for daily planning. Fortunately, ground has uplinked everything to the OCA's directories, so at least we have what we need onboard. But when we try and print, the printer locks up. It is not happy with the net now either. So Shep and Sergei start trying to figure out what is going on. After trying lots of other computer tricks that don't work, we put another network card in the server and that seems to fix the server problem. We power cycle the printer and that comes back. We are having a hard time understanding the how and why, but everything is working.


1 March: Never tell a network engineer how to do his job!

The RF access point (#1) is mounted in the aft hatchway of the lab. Square antenna is pointing in the nadir direction. We know this is not what ground wants, but the entire station is accessible on the RF net through this gateway. You can even be down in the Soyuz with your laptop and still stay on line. Putting this on the forward bulkhead as requested is just not suitable the way the lab is laid out right now.


2 March: UNIX isn't so friendly, either.

Up early. We were working late last night with the PCS configuration "patches", and wrestling with the UNIX commands. Laptops were reloaded and left shut down while other files were uploaded to the MDM's. The word from Houston this a.m. is to wait another rev to connect the first laptop so that we're sure the changes to the C&C computers are complete.


2 March: Outlook again. As usual.

Yuri is missing 5 emails in his outlook "Send" folder. He drafted these up last night, and they were left in his Outbox. They should be showing in his "Send" folder, but they're not there, and Outbox is empty. We think an old mail (ost) file was uplinked and overwrote what Yuri did. We call Houston to see if the outgoing files can be recovered. Houston puts this in work.