Discussion:
Weird WLAN problem after stupid upgrade attempt
Tom Seago
2008-01-04 10:37:10 UTC
Permalink
I have managed to turn what was a working G1G1 machine into a machine
which can no longer see it's wireless card on the USB bus. I thought
I caused this to happen with the following software gyrations, but it
is quite possible that I am the victim of an unfortunately timed and
entirely coincidental hardware failure. If anyone has any pointers to
help me figure out which of those two is the case - I will be
eternally grateful (or at least grateful for a really really long time
indistinguishable from eternity :) ).

The stupid part is how I started on this little odyssey of mine. I
stupidly tried to just run "olpc-update joyride-1492" from the command
line without having a developer key. The update process appeared to
run fine (which one could argue it should NOT have done if it wasn't
going to work later), but when the machine rebooted I got the "boot
failed" screen. Realizing that I probably _needed_ that whole
developer key thing, I attempted to give up on this ill-fated upgrade
by doing the "circle key" boot from the alternate OS image.

When doing the circle key reboot, the machine rebooted fine, except I
had no networking. No lights, no nothing. After some irrelevant
silliness on my part where I didn't check the $path variable but
started freaking out that ifconfig couldn't be found, I decided what I
needed was a fresh install to get everything back to the way it was so
I could continue to sit in my corner with the shipping version of the
software and avoid the joyride that I was clearly not qualified to be
on.

Thus, I found the wiki page about downloading the os653.img and fs.zip
files, I threw those on a USB stick, and did the "all buttons, I
really really mean it" reboot. The reflash proceeded without a hitch.
The machine restarted, had lost the things I had downloaded and it's
nickname (as expected), but alas - still no network.

At this point, I dug in to the exact situation as best I could. I
noticed a "eth0 no private ioctls" message during startup, which lead
me to this old ticket http://dev.laptop.org/ticket/1969 . That
ticket describes exactly what I have going on, in so far as the
wireless card has apparently disappeared.

Having a second, working G1G1 machine, I was able to determine that
normally there are 3 devices listed in /proc/bus/usb/devices on a
working machine. On my broken machine I only have 2. Also, the
usb8xxx and 802.11 modules aren't loaded - but that's not surprising
if the usb device wasn't found as the kernel loaded.

Looking in /var/log/messages from my good machine I see the following
during a _good_working_ boot.

Jan 4 17:59:56 localhost kernel: [ 19.120404] hub 1-0:1.0: USB hub
found
Jan 4 17:59:56 localhost kernel: [ 19.120730] hub 1-0:1.0: 4 ports
detected
Jan 4 17:59:56 localhost kernel: [ 19.244457] ohci_hcd 0000:00:0f.
4: OHCI Host Controller
Jan 4 17:59:56 localhost kernel: [ 19.269021] ohci_hcd 0000:00:0f.
4: new USB bus registered, assigned bus number 2
Jan 4 17:59:56 localhost kernel: [ 19.285507] ohci_hcd 0000:00:0f.
4: irq 10, io mem 0xfe01a000
Jan 4 17:59:56 localhost kernel: [ 19.404162] usb usb2:
configuration #1 chosen from 1 choice
Jan 4 17:59:56 localhost kernel: [ 19.428806] hub 2-0:1.0: USB hub
found
Jan 4 17:59:56 localhost kernel: [ 19.444807] hub 2-0:1.0: 4 ports
detected
<-- Same up to here -->
Jan 4 17:59:56 localhost kernel: [ 19.460720] hub_port_wait_reset:
portstatus=503 portchange=10
Jan 4 17:59:56 localhost kernel: [ 19.533567] usb 1-1: new high
speed USB device using ehci_hcd and address 2
Jan 4 17:59:56 localhost kernel: [ 19.580477] Initializing USB Mass
Storage driver...
Jan 4 17:59:56 localhost kernel: [ 19.610342] hub_port_wait_reset:
portstatus=503 portchange=10
Jan 4 17:59:56 localhost kernel: [ 19.720163] usb 1-1:
configuration #1 chosen from 1 choice
Jan 4 17:59:56 localhost kernel: [ 19.752491] usbcore: registered
new interface driver usb-storage
Jan 4 17:59:56 localhost kernel: [ 19.768864] USB Mass Storage
support registered.
Jan 4 17:59:56 localhost kernel: [ 19.799984] usbcore: registered
new interface driver libusual
Chris Ball
2008-01-04 16:12:12 UTC
Permalink
Hi Tom,

Once you have your developer key, please run:

ok test /wlan

If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!

- Chris.
--
Chris Ball <***@laptop.org>
Mitch Bradley
2008-01-04 18:22:18 UTC
Permalink
...
Another thing I have done is run the POST diagnostics by holding the
left rocker button during boot. I did this on both machines at the
same time to diff the results. Both say that usb port 0 is in use -
good. But the working machine did scroll some wlan diagnostic
information up the screen at the end of the the video tests that the
broken machine did not do. The broken machine did not report an error
- but it clearly did not run the same wlan test.
Hi Tom,
ok test /wlan
If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!
The POST diags that Tom ran include the "test /wlan" step. That
diagnostic info that scrolled by is a dump of the access point scan
info. It happens just before the touchpad test.

My recommendations:

a) Remove all power - AC and battery - for a few seconds to reset the
wireless really well, then reboot and try the POST diags again.

b) If that doesn't fix it, email me when you get your developer key and
I'll work with you on IRC to see if we can learn more about the failure
details.

Mitch Bradley
Tom Seago
2008-01-04 19:35:06 UTC
Permalink
Post by Mitch Bradley
...
Another thing I have done is run the POST diagnostics by holding
the left rocker button during boot. I did this on both machines
at the same time to diff the results. Both say that usb port 0 is
in use - good. But the working machine did scroll some wlan
diagnostic information up the screen at the end of the the video
tests that the broken machine did not do. The broken machine did
not report an error - but it clearly did not run the same wlan test.
Hi Tom,
ok test /wlan
If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!
The POST diags that Tom ran include the "test /wlan" step. That
diagnostic info that scrolled by is a dump of the access point scan
info. It happens just before the touchpad test.
a) Remove all power - AC and battery - for a few seconds to reset
the wireless really well, then reboot and try the POST diags again.
During my contortions I did try this step several times (ok, 3 or 4
times at least, maybe not _several_). No luck. I'll give it one more
good hour long no-power soak later tonight just for grins because it
certainly can't make anything worse at this point.
Post by Mitch Bradley
b) If that doesn't fix it, email me when you get your developer key
and I'll work with you on IRC to see if we can learn more about the
failure details.
Thank you very much for the offer. The web page tells me I should get
my key tomorrow, so I will try to find you after that.
Post by Mitch Bradley
Mitch Bradley
Tom Seago
2008-01-10 00:20:02 UTC
Permalink
I've updated to the Q2D08 firmware now.

Whereas the '07 firmware seemed to run the test /wlan ok, but the wlan
card wasn't visible to the regular OS, now when I run the same command
on the '08 firmware I get

ok test /wlan
Device /wlan not found.
ok

So what's the verdict. Do folks think I should RMA this thing? It
seems like it got busted merely via software - which one would
normally not expect to be possible.

(-: Tom ;-)
Hi Mitch. I got my developer key and ran "test /wlan" at the ofw
prompt. That seemed to work! At least, it scrolled a fairly large
amount of info up the screen which appeared to be the stats of
various visible wifi networks. However, all my problems where the
OS can't see the card remain.
It makes me happy that this appears to be software not hardware. It
still confuses me how I managed to get into this bizarre situation
though. Is there a way to do any more complete of a device wipe
beyond the normal reflashing procedure using the signed build from a
usb stick?
I appreciate your help. I hope we can figure out what's up with this
thing.
(-: Tom ;-)
Post by Mitch Bradley
...
Another thing I have done is run the POST diagnostics by holding
the left rocker button during boot. I did this on both machines
at the same time to diff the results. Both say that usb port 0
is in use - good. But the working machine did scroll some wlan
diagnostic information up the screen at the end of the the video
tests that the broken machine did not do. The broken machine did
not report an error - but it clearly did not run the same wlan
test.
Hi Tom,
ok test /wlan
If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!
The POST diags that Tom ran include the "test /wlan" step. That
diagnostic info that scrolled by is a dump of the access point scan
info. It happens just before the touchpad test.
a) Remove all power - AC and battery - for a few seconds to reset
the wireless really well, then reboot and try the POST diags again.
b) If that doesn't fix it, email me when you get your developer key
and I'll work with you on IRC to see if we can learn more about the
failure details.
Mitch Bradley
Mitch Bradley
2008-01-10 00:27:02 UTC
Permalink
If you revert to Q2D07, does "test /wlan" work again?

Q2D07 and Q2D08 have different versions of the wlan firmware wad.
Perhaps there is something about your wlan hardware that works with the
wlan firmware image in Q2D07, but not with the one in Q2D08 and not with
whichever version is in your OS version. That is one of the only two
hypotheses that come to mind. The other on is the possibility that
there is something flaky about the onboard USB connection, so that the
device sometimes enumerates and sometimes doesn't, depending on subtle
factors.
Post by Tom Seago
I've updated to the Q2D08 firmware now.
Whereas the '07 firmware seemed to run the test /wlan ok, but the wlan
card wasn't visible to the regular OS, now when I run the same command
on the '08 firmware I get
ok test /wlan
Device /wlan not found.
ok
So what's the verdict. Do folks think I should RMA this thing? It
seems like it got busted merely via software - which one would
normally not expect to be possible.
(-: Tom ;-)
Hi Mitch. I got my developer key and ran "test /wlan" at the ofw
prompt. That seemed to work! At least, it scrolled a fairly large
amount of info up the screen which appeared to be the stats of
various visible wifi networks. However, all my problems where the OS
can't see the card remain.
It makes me happy that this appears to be software not hardware. It
still confuses me how I managed to get into this bizarre situation
though. Is there a way to do any more complete of a device wipe
beyond the normal reflashing procedure using the signed build from a
usb stick?
I appreciate your help. I hope we can figure out what's up with this
thing.
(-: Tom ;-)
Post by Mitch Bradley
...
Another thing I have done is run the POST diagnostics by holding
the left rocker button during boot. I did this on both machines
at the same time to diff the results. Both say that usb port 0 is
in use - good. But the working machine did scroll some wlan
diagnostic information up the screen at the end of the the video
tests that the broken machine did not do. The broken machine did
not report an error - but it clearly did not run the same wlan test.
Hi Tom,
ok test /wlan
If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!
The POST diags that Tom ran include the "test /wlan" step. That
diagnostic info that scrolled by is a dump of the access point scan
info. It happens just before the touchpad test.
a) Remove all power - AC and battery - for a few seconds to reset
the wireless really well, then reboot and try the POST diags again.
b) If that doesn't fix it, email me when you get your developer key
and I'll work with you on IRC to see if we can learn more about the
failure details.
Mitch Bradley
Tom Seago
2008-01-10 02:13:43 UTC
Permalink
Well, darnit. Looks like things are flaky. Here is the procedure and
the results I just got.

- Flash q2d07.rom

test /wlan worked fine

- Flash q2d08.rom (to verify that 08 is causing problems)

test /wlan fails - device not found

- Flash q2d08.rom (because I wanted 07 and mistyped the filename)

- Flash q2d07.rom

test /wlan fails - device not found

- Pull battery and AC, wait 20 seconds

test /wlan fails - device not found

- Flash q2d07.rom again

test /wlan fails - device not found

test /usb - shows port 0 in use, same as on my working machine. This
is with no USB sticks connected. When a USB stick is connected, then
ports 0 and 1 are in use.

So now that test /wlan is failing with '07, that encourages me to
believe a simple hardware issue is occurring. Although the WLAN card
is a USB device, I'm assuming it is surface mounted and there isn't
any sort of physical connection that could have shaken loose is
there? I'm not opposed to the use of a screwdriver, but I haven't
bothered yet since I doubt that I could do anything.

(-: Tom ;-)
Post by Mitch Bradley
If you revert to Q2D07, does "test /wlan" work again?
Q2D07 and Q2D08 have different versions of the wlan firmware wad.
Perhaps there is something about your wlan hardware that works with
the wlan firmware image in Q2D07, but not with the one in Q2D08 and
not with whichever version is in your OS version. That is one of
the only two hypotheses that come to mind. The other on is the
possibility that there is something flaky about the onboard USB
connection, so that the device sometimes enumerates and sometimes
doesn't, depending on subtle factors.
Post by Tom Seago
I've updated to the Q2D08 firmware now.
Whereas the '07 firmware seemed to run the test /wlan ok, but the
wlan card wasn't visible to the regular OS, now when I run the same
command on the '08 firmware I get
ok test /wlan
Device /wlan not found.
ok
So what's the verdict. Do folks think I should RMA this thing? It
seems like it got busted merely via software - which one would
normally not expect to be possible.
(-: Tom ;-)
Hi Mitch. I got my developer key and ran "test /wlan" at the ofw
prompt. That seemed to work! At least, it scrolled a fairly
large amount of info up the screen which appeared to be the stats
of various visible wifi networks. However, all my problems where
the OS can't see the card remain.
It makes me happy that this appears to be software not hardware.
It still confuses me how I managed to get into this bizarre
situation though. Is there a way to do any more complete of a
device wipe beyond the normal reflashing procedure using the
signed build from a usb stick?
I appreciate your help. I hope we can figure out what's up with
this thing.
(-: Tom ;-)
Post by Mitch Bradley
...
Another thing I have done is run the POST diagnostics by holding
the left rocker button during boot. I did this on both
machines at the same time to diff the results. Both say that
usb port 0 is in use - good. But the working machine did
scroll some wlan diagnostic information up the screen at the
end of the the video tests that the broken machine did not do.
The broken machine did not report an error - but it clearly did
not run the same wlan test.
Hi Tom,
ok test /wlan
If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!
The POST diags that Tom ran include the "test /wlan" step. That
diagnostic info that scrolled by is a dump of the access point
scan info. It happens just before the touchpad test.
a) Remove all power - AC and battery - for a few seconds to reset
the wireless really well, then reboot and try the POST diags again.
b) If that doesn't fix it, email me when you get your developer
key and I'll work with you on IRC to see if we can learn more
about the failure details.
Mitch Bradley
John Watlington
2008-01-10 03:50:02 UTC
Permalink
Tom,
I would RMA the unit, going through Adam Holt instead of the
normal process. We want to see this unit ourselves before sending
it back to the manufacturer, as we haven't seen this problem in the
past.

The WLAN is a separate module, surface mounted to the motherboard.
There have been occasional cases in the past where insufficient paste
was present to properly solder it down, but they were caught in board
level testing before assembly, and haven't been seen on mass
production units. Given the lead-free soldering process we are
using, solder cracks are also a possibility.

If you decide to take a look yourself, all you need is one #1 phillips
head screwdriver. I wouldn't even disconnect the LCD, just move it
enough to get to the four screws holding on the back cover located
behind it. A total of twelve screws need to be removed to get to the
motherboard.

Cheers,
wad
Post by Tom Seago
Well, darnit. Looks like things are flaky. Here is the procedure and
the results I just got.
- Flash q2d07.rom
test /wlan worked fine
- Flash q2d08.rom (to verify that 08 is causing problems)
test /wlan fails - device not found
- Flash q2d08.rom (because I wanted 07 and mistyped the filename)
- Flash q2d07.rom
test /wlan fails - device not found
- Pull battery and AC, wait 20 seconds
test /wlan fails - device not found
- Flash q2d07.rom again
test /wlan fails - device not found
test /usb - shows port 0 in use, same as on my working machine. This
is with no USB sticks connected. When a USB stick is connected, then
ports 0 and 1 are in use.
So now that test /wlan is failing with '07, that encourages me to
believe a simple hardware issue is occurring. Although the WLAN card
is a USB device, I'm assuming it is surface mounted and there isn't
any sort of physical connection that could have shaken loose is
there? I'm not opposed to the use of a screwdriver, but I haven't
bothered yet since I doubt that I could do anything.
(-: Tom ;-)
Post by Mitch Bradley
If you revert to Q2D07, does "test /wlan" work again?
Q2D07 and Q2D08 have different versions of the wlan firmware wad.
Perhaps there is something about your wlan hardware that works with
the wlan firmware image in Q2D07, but not with the one in Q2D08 and
not with whichever version is in your OS version. That is one of
the only two hypotheses that come to mind. The other on is the
possibility that there is something flaky about the onboard USB
connection, so that the device sometimes enumerates and sometimes
doesn't, depending on subtle factors.
Post by Tom Seago
I've updated to the Q2D08 firmware now.
Whereas the '07 firmware seemed to run the test /wlan ok, but the
wlan card wasn't visible to the regular OS, now when I run the same
command on the '08 firmware I get
ok test /wlan
Device /wlan not found.
ok
So what's the verdict. Do folks think I should RMA this thing? It
seems like it got busted merely via software - which one would
normally not expect to be possible.
(-: Tom ;-)
Hi Mitch. I got my developer key and ran "test /wlan" at the ofw
prompt. That seemed to work! At least, it scrolled a fairly
large amount of info up the screen which appeared to be the stats
of various visible wifi networks. However, all my problems where
the OS can't see the card remain.
It makes me happy that this appears to be software not hardware.
It still confuses me how I managed to get into this bizarre
situation though. Is there a way to do any more complete of a
device wipe beyond the normal reflashing procedure using the
signed build from a usb stick?
I appreciate your help. I hope we can figure out what's up with
this thing.
(-: Tom ;-)
Post by Mitch Bradley
...
Another thing I have done is run the POST diagnostics by holding
the left rocker button during boot. I did this on both
machines at the same time to diff the results. Both say that
usb port 0 is in use - good. But the working machine did
scroll some wlan diagnostic information up the screen at the
end of the the video tests that the broken machine did not do.
The broken machine did not report an error - but it clearly did
not run the same wlan test.
Hi Tom,
ok test /wlan
If that fails too, it should be clear that we need to RMA and replace
your laptop. Thanks!
The POST diags that Tom ran include the "test /wlan" step. That
diagnostic info that scrolled by is a dump of the access point
scan info. It happens just before the touchpad test.
a) Remove all power - AC and battery - for a few seconds to reset
the wireless really well, then reboot and try the POST diags again.
b) If that doesn't fix it, email me when you get your developer
key and I'll work with you on IRC to see if we can learn more
about the failure details.
Mitch Bradley
_______________________________________________
Devel mailing list
http://lists.laptop.org/listinfo/devel
Dan Krejsa
2008-01-10 05:02:33 UTC
Permalink
Post by John Watlington
Given the lead-free soldering process we are
using, solder cracks are also a possibility.
Or tin whiskers?
John Watlington
2008-01-10 05:13:18 UTC
Permalink
Post by Dan Krejsa
Post by John Watlington
Given the lead-free soldering process we are
using, solder cracks are also a possibility.
Or tin whiskers?
How'd you know what my nightmares look like ?

Yes, but I don't expect tin whiskers to form within a month of
fabrication.

wad

Loading...