Pages: [1]
  Print  
Author Topic: OA Multiplayer server interface kills NIC [not (exactly) a bug]  (Read 15716 times)
majuk
Nub


Cakes 0
Posts: 9


« on: August 20, 2009, 01:09:16 AM »

This is a bizarre one.

When I launch the Multiplayer Server Selection interface, I instantly lose all network connectivity.

In addition, I also get periodic full-locks lasting ~15 seconds while in the Multiplayer Server Selection screen I get nowhere else in game. Corresponds with no button presses; I assume it's trying to resolve whatever network problem that's causing the NIC not to route.

When the network fails, I am unable to route beyond my local host.

dmesg, /var/log/messages and OA show no errors or warnings. netstat shows normal traffic. ifconfig, resolv.conf and route are unchanged.

OA ver 0.8.1
Xubuntu Jaunty ver 9.0.4

NIC is an onboard nVidia Gig Eth (XFI 780i mobo)
DHCP
Intel Core2Duo
GeForce GTS 8800

Open to all suggestions. I'm going to test to see if this happens on my Gentoo install as well. I'll post the results here.

Thanks in advance.
« Last Edit: September 05, 2009, 06:52:18 PM by majuk » Logged
Cacatoes
Banned for leasing own account
Posts a lot
*

Cakes 73
Posts: 1427


also banned for baiting another to violate rules


« Reply #1 on: August 20, 2009, 05:06:57 AM »

I confirm you get "locked" when you refresh the server list, but I didn't suspect it to stop other network activities...
Logged

Todo: Walk the cat.
majuk
Nub


Cakes 0
Posts: 9


« Reply #2 on: August 20, 2009, 05:51:21 PM »

The locking doesn't correspond with clicking the Refresh button. It's just at random while on the server selection screen. My guess is it's trying to resolve the master server IP (as DNS lookups fail after launching the server selection screen), but that is only a guess.

The rest of the network goes down from the moment I open the OA Multiplayer server selection screen until... well, until it comes back.

I can resolve the problems in a few ways: bouncing the interface, running arp -a or waiting a few minutes after I exit the program. Again, I haven't been able to narrow down the problem beyond not being able to route outside the local host.

Does anyone have a way that I can break OA's window focus so I can run some diagnostics in tandem with OA? Right now I'm having to go in, break everything, then exit out to do diagnostics, not exactly proficient. I should be able to get some better information if I can do that.

::EDIT:: I don't know if I made it clear, but OA has no connection to the net either. The whole system is unable to route, it's not that OA can and everything else can't. Thanks again for the help guys.
« Last Edit: August 20, 2009, 06:09:34 PM by majuk » Logged
majuk
Nub


Cakes 0
Posts: 9


« Reply #3 on: August 20, 2009, 06:15:10 PM »

Ok, so some more diagnostics, took my switch out of the equation.

The net disconnecting is a direct result of whatever OA is doing to query the master server. When I launch it (and the list auto-refreshes), my system reports a link down event. Let it sit for 10-20 seconds, it reports the connection is back. Hit refresh, the connection reports disco'd again.

I feel like Jack and Jill following bread crumbs.
Logged
Falkland
Member


Cakes 6
Posts: 590


« Reply #4 on: August 20, 2009, 07:41:41 PM »

Ok, so some more diagnostics, took my switch out of the equation.

The net disconnecting is a direct result of whatever OA is doing to query the master server. When I launch it (and the list auto-refreshes), my system reports a link down event. Let it sit for 10-20 seconds, it reports the connection is back. Hit refresh, the connection reports disco'd again.

I feel like Jack and Jill following bread crumbs.

Why don't u try to run tcpdump with maximum verbose option ( -vvv ) before running OA

Code:
sudo tcpdump -s 0 -vvv -n -w oadump.dump

 -s 0 instructs the program to not truncate too large packets
 -n instructs the program to not resolv addresses
 - w <namefile> write the dump into a regular file instead of the standard output

Later after stopping the tcpdump , you can use wireshark to open the oadump.dump file and inspect every single packet.

Anyway try to not filter the output because u need to inpect all packets ( ARP requests included )

I will try to do the same.

I know that openarena tries to access to /proc/net when it starts because I've set up an apparmor profile to restrict access to the OA binary ; this is tipically the audit output that appears on kernel log :

Code:
...
[29301.268971] audit(1250812068.736:6): type=1503 operation="inode_permission" requested_mask="::r" denied_mask="::r" name="/proc/net/" pid=7089 profile="$HOME/openarena-0.8.1/openarena.i386" namespace="default"
...

Logged
majuk
Nub


Cakes 0
Posts: 9


« Reply #5 on: August 20, 2009, 09:26:07 PM »

rofl, I've never seen a -vvv cli option before.

Experimenting and eating pizza. Results soon.
Logged
majuk
Nub


Cakes 0
Posts: 9


« Reply #6 on: August 20, 2009, 11:49:19 PM »

One step closer.

Here's a pastebin of the tcpdump output (I did a plaintext export with wireshark): http://dpaste.com/hold/83459/

I cut it down to the 33 packets I couldn't account for (they may not all be related to OA or this issue, but most are)

For the tldr crowd, here's my interpretation:

  • I see the DNS query to the master server go off and it gets a viable reply. When the program establishes the connection to the server is when I lose connection.
  • About 20 seconds later, my NIC starts broadcasting for a DHCP IP, to which it gets no response.
  • Another 20 seconds after that, DHCP bcast again, this time my router sends an ARP request for my old IP address (to which my machine does not respond), then OFFERs me my same IP I had before, REQ, ACK.
  • After this, my comp ARP bcasts for my router, the gateway, again and we're back in business.

kern.log had no 'audit' calls or anything related to OA.

 /var/log/syslog a little more informative, but marginally. At the moment of truth(^), it too only reports that the connection has gone down, having lost carrier.

eth0 behaves in an identical manner to eth1 (both onboard GigE NICs)

You can see it dumps a route (*) after the line goes down, but what faulty route is getting put in and how?

::EDIT:: Re-ran scenario with $inotifywatch -r /proc/net/ running, reported no changes. Undecided

Quote
Aug 21 00:17:45 slicktop NetworkManager: <info>  Activation (eth1) Stage 5 of 5 (IP Configure Commit) scheduled...
Aug 21 00:17:45 slicktop NetworkManager: <info>  Activation (eth1) Stage 4 of 5 (IP Configure Get) complete.
Aug 21 00:17:45 slicktop NetworkManager: <info>  Activation (eth1) Stage 5 of 5 (IP Configure Commit) started...
Aug 21 00:17:45 slicktop avahi-daemon[3432]: Joining mDNS multicast group on interface eth1.IPv4 with address 192.168.10.102.
Aug 21 00:17:45 slicktop avahi-daemon[3432]: New relevant interface eth1.IPv4 for mDNS.
Aug 21 00:17:45 slicktop avahi-daemon[3432]: Registering new address record for 192.168.10.102 on eth1.IPv4.
Aug 21 00:17:45 slicktop dhclient: bound to 192.168.10.102 -- renewal in 276770 seconds.
Aug 21 00:17:46 slicktop NetworkManager: <info>  (eth1): device state change: 7 -> 8
Aug 21 00:17:46 slicktop NetworkManager: <info>  Policy set 'Auto eth1' (eth1) as default for routing and DNS.
Aug 21 00:17:46 slicktop NetworkManager: <info>  Activation (eth1) successful, device activated.
Aug 21 00:17:46 slicktop NetworkManager: <info>  Activation (eth1) Stage 5 of 5 (IP Configure Commit) complete.
Aug 21 00:17:46 slicktop postfix/master[3109]: reload configuration /etc/postfix
Aug 21 00:17:47 slicktop ntpdate[6298]: adjust time server 91.189.94.4 offset 0.157980 sec
Aug 21 00:17:53 slicktop kernel: [ 3452.288006] eth1: no IPv6 routers present
^Aug 21 00:18:21 slicktop kernel: [ 3480.668856] eth1: link down.
Aug 21 00:18:21 slicktop NetworkManager: <info>  (eth1): carrier now OFF (device state Cool
Aug 21 00:18:21 slicktop NetworkManager: <info>  (eth1): device state change: 8 -> 2
Aug 21 00:18:21 slicktop NetworkManager: <info>  (eth1): deactivating device (reason: 40).
Aug 21 00:18:21 slicktop NetworkManager: <info>  eth1: canceled DHCP transaction, dhcp client pid 6241
*Aug 21 00:18:21 slicktop NetworkManager: <WARN>  check_one_route(): (eth1) error -34 returned from rtnl_route_del(): Sucess
Aug 21 00:18:21 slicktop avahi-daemon[3432]: Withdrawing address record for 192.168.10.102 on eth1.
Aug 21 00:18:21 slicktop avahi-daemon[3432]: Leaving mDNS multicast group on interface eth1.IPv4 with address 192.168.10.102.
Aug 21 00:18:21 slicktop avahi-daemon[3432]: Interface eth1.IPv4 no longer relevant for mDNS.
Aug 21 00:18:21 slicktop postfix/master[3109]: reload configuration /etc/postfix
Aug 21 00:18:35 slicktop NetworkManager: <info>  (eth1): carrier now ON (device state 2)
Aug 21 00:18:35 slicktop NetworkManager: <info>  (eth1): device state change: 2 -> 3
Aug 21 00:18:35 slicktop NetworkManager: <info>  Activation (eth1) starting connection 'Auto eth1'
Aug 21 00:18:35 slicktop NetworkManager: <info>  (eth1): device state change: 3 -> 4
Aug 21 00:18:35 slicktop NetworkManager: <info>  Activation (eth1) Stage 1 of 5 (Device Prepare) scheduled...
« Last Edit: August 21, 2009, 12:16:46 AM by majuk » Logged
davidd
Half-Nub


Cakes 6
Posts: 99


[Z] server maintainer


« Reply #7 on: August 21, 2009, 01:12:01 AM »

My internet is a bit choppy at times (restarting the adsl modem fixes it)
but i experience something like you whenever the DNS query of the master server fails.
When i click on the "multiplayer" button in the first menu, all animation freezes until the dns of the masterserver is returned.

my theory for what happens with majuk is different from my situation
right after querying the master server, the client should sent out an udp packet (or more) to every OA server returnned. To switches and routers that try to pretend intelligence, this look like an udp-dos attack. And that may kcik some users out. Thanks Majuk for reporting it, because it prolly means another 10 or 100 people with the same bug not reporting it. This way there is a chance that it will get fixed.

So my question is, during the 20 sec downtime, run /sbin/mii-tool and tell the link state of your networkcable.
Logged

Openarena, the freedom to contribute and extend.
majuk
Nub


Cakes 0
Posts: 9


« Reply #8 on: August 21, 2009, 02:10:29 AM »

mii-tool is apparently unsupported by my NIC.

ethtool works though, but I can't do a running monitor. So I went in game, broke it and got out ASAP.

ethtool reported the media as disco'd (*) when it went down.

Quote
majuk@slicktop:~$ sudo ethtool eth0
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: Unknown! (65535)
        Duplex: Unknown! (255)
        Port: MII
        PHYAD: 1
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
       *Link detected: no
Logged
majuk
Nub


Cakes 0
Posts: 9


« Reply #9 on: August 21, 2009, 02:27:56 AM »

Pffft, well damnit, ddavid hit the nail on the head... I think.

I plugged my cable modem directly in to my comp and now the server selection screen works. So I guess the router was disconnecting me from its port due to... it being way too damn cool for its own good. Probably the flood of responses from all the servers.

Interesting, my router's logs are totally clear. What the stupid.

We can keep diagnosing this if you guys would like, but I'm thinking it was my router dropping my connect.
Logged
majuk
Nub


Cakes 0
Posts: 9


« Reply #10 on: August 21, 2009, 02:56:38 AM »

omg this is fun online. :DDDDDD
Logged
Falkland
Member


Cakes 6
Posts: 590


« Reply #11 on: August 21, 2009, 09:01:05 AM »

Pffft, well damnit, ddavid hit the nail on the head... I think.

I plugged my cable modem directly in to my comp and now the server selection screen works. So I guess the router was disconnecting me from its port due to... it being way too damn cool for its own good. Probably the flood of responses from all the servers.

Interesting, my router's logs are totally clear. What the stupid.

We can keep diagnosing this if you guys would like, but I'm thinking it was my router dropping my connect.

What I see is that the net link is lost after querying the dpmaster and before receiving its reply. So or the query or something that happens after the query is sent , is causing the disconnection.
Logged
majuk
Nub


Cakes 0
Posts: 9


« Reply #12 on: August 21, 2009, 09:09:34 AM »

I'll try to get confirmation that it's my router shutting down the connection and why later tonight sometime.
Logged
GrosBedo
Member


Cakes 20
Posts: 710


« Reply #13 on: May 22, 2010, 05:31:32 AM »

I know this thread is old, but just my theory about the disconnection of the computer from all network :

When you go online, OpenArena will try to send request to all master listing servers youve set in your config, but not only once, but as much as possible in the timeframe he will have, and since it waits for a reply and gets none, he will retry, retry, retry... and stop at some point because of an inner timeout in the client netcode.

But meanwhile, your OA client spammed your network. If your router is a good one (like Cisco), then probably it has some security measures, like disconnecting the network spammers to avoid network overload. That's what happened to you.

That's all.
Logged
Pages: [1]
  Print  
 
Jump to: