Re: [liblo] revisiting getnameinfo slowness bug

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Okay I'm just going to follow up this thread, because I've just done a
bit of work on the use of getnameinfo() in lo_server initialization
and I want to document what I'm proposing to change.

TL;DR:
Proposed changes can be found in branch 'hostname':
https://github.com/radarsat1/liblo/tree/hostname

It would be great if someone could test this branch with
--enable-ipv6, and tell me if lo_server_get_url() always returns
something sensible and that there are no long delays during server
initialization.

The long version:

Basically, I think the use of getnameinfo() in
lo_server_new_with_proto_internal() is wrong, and that is what has
contributed to slowness during initialization.  The problem was
postponed by disabling the call to getnameinfo() when in IPv4-only
mode, which I would say was the correct thing to do until this could
be understood better.  But I've done some reading and I think I
understand better now what is going on here.

First, some really good references on modern socket programming that I
came across in this research:

Ulrich Drepper's IPv6 tutorial:
http://www.akkadia.org/drepper/userapi-ipv6.html

Beej's Guide to Network Programming:
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html

It seems that getnameinfo() is being used solely to help fill in
lo_server.hostname.  If it fails, gethostname() is used instead as a
back-up, and if that fails, "localhost" is used.  As far as I can
tell, the main purpose of  lo_server.hostname is to provide it during
lo_server_get_url().  In other words, it is used to tell other
processes, possibly on remote hosts, how to contact this server.

In practice, after some testing, I found that getnameinfo() was
basically always failing.  On further inspection, it is because the
ai_addr field used to inform it was always invalid.  Further up in
this function, this ai_addr field is determined by getaddrinfo(),
which is correctly used with the AI_PASSIVE field to determine a list
of ports to try and bind().  However, there is nothing in the
documentation that I can find that says that
getaddrinfo()+getnameinfo() can be used to find the hostname.  In fact
ai_addr seems to always be invalid when no hostname is provided, on
Linux 2.6.35 and OS X 10.6 systems I tested on.

Now, getaddrinfo() can also be used to get the "canonical name" of a
host.  However this requires providing a hostname as input, meaning
using gethostname() anyways.

One other thing: in the case of multicast, although gethostname() is
still valid, in fact the address that remotes should use to contact
the server is _not_ the hostname, but rather the multicast group
address.  Therefore I think that in the case of a multicast server,
_get_url() should return the multicast group.  So I propose the
following strategy:

1. If multicast, provide group address to getaddrinfo() with AI_CANONNAME.
2. Otherwise, provide hostname=null to getaddrinfo()

Later, when filling in hostname:
3. If ai_canonname != null, strcpy(hostname, ai_canonname)
4. Else, if group != null, strcpy(hostname, group)
5. Else, gethostname(hostname)

I don't see any point in using getnameinfo() to get the local
hostname, it doesn't seem to have that purpose.  I also don't see the
point in passing the result of gethostname() through gethostbyname(),
which if I understand used to be a good way to get the IP address, but
is not helpful for resolving a better hostname -- it seems to be used
just to check that the hostname *can* resolve to an IP address, but
I'm not sure this is a useful step.

We do need getnameinfo() to resolve the port number associated with
the "service", although in most cases I think this will basically be
the identity function with the provided port string since ports are
usually provided numerically.  But anyways it doesn't hurt to do so,
now it's possible to create a server on port "http" for example.

I think this is the best replacement for the current code, without
changing semantics.  However, finally, I actually think the above may
still be somewhat wrong.  Although, locally, the hostname is valid,
there is no reason to think that it is actually the way remotes can
identify this host.  Rather, there can actually be several IP
addresses and hostnames associated with a server, namely at minimum
both the hostname and "localhost".  It makes me wonder if it would be
appropriate in fact to return a list of URLs, one for each individual
hostname associated with the computer's interfaces, by calling
getnameinfo() on each result of getifaddrs().  In practice I'm not
sure what the user would do with this list, and I think it would get
confusing, so simply gethostname() seems to be the best approach for
now.

It also occurred to me that there's no reason we couldn't upgrade the
lo_server implementation to simultaneously handle waiting on multiple
ports and even multiple protocols, since after all it already
maintains a list of sockets for TCP.  It could be useful to be able to
make a single server to simultaneously handle UDP and TCP.  But maybe
that's for another day.

Steve