HK1096499B

HK1096499B - Method and apparatus for multi-language domain name service

Info

Publication number: HK1096499B
Application number: HK07101159.8A
Authority: HK
Inventors: 谭亭维; 沈青宏; 谭镌冈; 龙柯永; 唐．艾尔文．翠西．德．西尔瓦; 李光松; 爱德华．S．泰; 萨布拉曼尼恩．萨比阿
Original assignee: I-Dns通讯国际公司
Priority date: 1999-02-26
Filing date: 2007-02-01
Publication date: 2013-01-04

Description

Multi-language domain name service method and device

This application is a divisional application of application No. 00101694.6.

Technical Field

The present invention relates to a domain name service for resolving a network domain name into corresponding network addresses. More particularly, the present invention relates to accepting alternative or improved domain name services for domain names provided in different encoding formats, not just ASCII formats.

Background

The internet has evolved from a purely research and academic field to a global network, penetrating into a diverse society with different languages and cultures. The internet has provided services to its users in all areas. Today, the exchange of e-mail is already available in many languages. Content on the world wide web is also being distributed in many different languages as a proliferation of multilingual software applications. For example, an email message may be sent to another person in china or a web page browsing japanese.

Today the internet relies solely on the domain name system to break down human-readable names into numeric IP addresses or vice versa. The Domain Name System (DNS) is still based on a subset of the latin-1 alphabet and is therefore still predominantly in english. To provide versatility, email addresses, Web addresses and other internet addressing formats all use ASCII as a global standard to ensure interoperability. There is no current specification for allowing email addresses or Web addresses to use native languages other than ASCII. This means that any user of the internet must have some basic knowledge of the ASCII characters.

Although this situation is generally not problematic for technical or commercial users who understand english as an international language for science, technology, business and politics, it is a great obstacle to the rapid proliferation of the internet in countries where english is not widely used. In those countries, internet novices must understand some basic english language as a preparatory condition for sending emails in his native language, because even if an email application can support the native language, the email address does not support it. Intranets must use ASCII to simply name their department domain names and Web documents because the protocol does not support any notation other than ASCII in the domain name domain even though there may be multiple file names and directory paths locally.

Moreover, users speaking european languages must approximate their domain names to domain names without accents and the like. E.g. wishing to identify a businessThe company has to approximate itself to a closest ASCII symbol for use "www.citroen.fr"Mr. Francois from France has to often tolerate deliberately misspelling its email address as"francoisemail.fr"annoyance (example of artifact).

Currently, just as operating systems can be localized, user identifications in the email address field can be written in multi-language scripts to be localized to provide corresponding local fonts. And may also provide directory and file names in a multi-language script. But the domain name parts of these names are limited to those allowed in the RFC1035 internet standard, a standard that sets forth the domain name system.

One justified reason for this may be software developers who tend to use overlapping code. For example, the Chinese BIG5 and GB2312 codes (i.e., the numeric representation of glyphs or characters) overlap, as do Japanese JIS and Shift-JIS and Korean KSC 5601. As a result, it is difficult for the client to ensure a distinction between BIG5 and JIS or GB2312 and KSC5601 unless an additional parameter specifying the code is included in the code to indicate what code the application client is using. Thus, to ensure the uniqueness and certainty of the encoding of the domain name, the DNS has to use ASCII.

According to RFC1035, currently available domain names are limited to only a subset of the ISO-8859 Latin 1 alphabet, which includes the letters A-Z (situation insensitive), the numbers 0-9, and hyphens (-). This restriction effectively allows a domain name to support english or languages with Romaji, such as Malay or Romaji in japanese, or roman transliterations, such as the transliteration Tamil. But does not accept other scripts and is not used even with extended ASCII characters.

The Unicode standard (Unicode) is a character encoding system in which each character of almost the most important languages is uniquely mapped to a 16-bit value. Since the Unicode standard provides the basis for a unique non-overlapping encoding system, some researchers have begun exploring how the Unicode standard can serve as the basis for future DNS namespaces, and thus can encompass the rich and diverse languages presented in the world today. See M.D. Hurst, "International of Domain Names," Internet Draft "Draft-duerst-dns-i 18n-02.txt," can be athttp://www.ietf.cnri.reston.va.us/ID.htmlJuly1998 is found on the IETF homepage. This document is incorporated herein by reference. The new namespace should provide multi-language and multi-script functionality so that non-english users can easily use the internet.

The use of unicode standards as the standard character set for new domain name systems avoids overlapping code spaces for different language scripts. In this way, the Internet community can use its native language scripts such as:

www.citroёn.ch

www.genève-city.ch

unfortunately, several difficulties have prevented the development of DNS servers and client applications to implement a multilingual domain name system. For example all future client applications and all future servers are to be improved. Transitioning from an old system to a new system can be difficult because the client and server must be modified to work with the system. In addition, the available client applications rarely use the native unicode standard. In contrast, most multilingual client applications use non-Unicode standards for encoding and are widely used.

Based on the above facts, it is highly desirable to have a technique that enables the use of multi-language encoding in a DNS system.

Disclosure of Invention

The present invention provides a system and method for implementing a multilingual domain name system that allows users to use non-uniform character encoding standards and non-ASCII encoded domain names. The method may be implemented in different systems or combinations of systems, and thus the system is referred to as an international DNS server (or "iDNS" server). When an iDNS server first receives a DNS request, it determines the encoding type of that request. It may make the determination by considering the bit string of the top-level domain of the domain name and matching that string to a list of known bit strings of known top-level domains of different encoding types. One entry in the list may be a ". com" bit string for, for example, Chinese BIG5. When the iDNS server identifies the encoding type of a domain name, it converts the encoding of the domain name to a universal language encoding type (e.g., unicode standard). The generic language code type representation is then translated into an ASCII representation in accordance with the generic DNS standard. The representation is then passed to a conventional domain name system which recognizes the ASCII format domain name and returns the associated IP address.

One aspect of the present invention provides a method for detecting a language encoding type of a digitally represented domain name, the method characterized by the steps of: (a) receiving a numerical sequence of a pre-specified portion of a numerically represented domain name (i.e., a top-level domain); (b) matching the number sequence of the domain name with a known number sequence in a set of known number sequences; (c) identifying a type of code associated with a known number sequence that matches the number sequence in the domain name; wherein each digit sequence in (b) is associated with a particular language code type. Note that the set includes known digit sequences of at least two different language coding types.

It is often convenient to provide a collection in a table that includes attributes that include a known number sequence and encoding type. Such identifying the encoding type includes identifying an encoding type having records that match a known sequence of numbers. The table includes at least the following coding types: ASCII, BIG5, GB2312, shift-JIS, EUC-JP, KSC5601, and ASCII extended.

Ambiguity resolution is necessary when at least two known digit sequences match the digit sequences in the domain name. This can be achieved by: (a) receiving a sequence of digits of a second portion of a digitally represented domain name; (b) decoding the second partial number sequence a plurality of times, each time using a different language coding type of decoding scheme, each associated with at least two known number sequences; and (c) identifying the decoding that gave the best results. Alternatively, the ambiguity can be resolved by: an extended number sequence (comprising the first and second parts of the domain name) is first matched and then the extended sequence is matched against a known number sequence which may correspond to the extended sequence. In this case, the set of known number sequences must include some spreading sequence.

In one embodiment, the set of records includes a sequence of numbers (or a representation of a sequence of numbers) of a "minimal code decomposition string (MCRS)". This is a partial number sequence of a domain name and is known to distinguish that domain name of that particular encoding type from any other domain name/encoding type combination in the set. As long as ambiguity is avoided when a match occurs, the MCRS may be a substring of the top-level domain, a superstring of the top-level domain, an overflow (overflow) to the second and third-level domains, etc.

As mentioned above, the method is particularly applicable to handling DNS requests. Thus the method also comprises: (i) receiving a DNS request including a digitally represented domain name; (ii) identifying a root level DNS server responsible for resolving a root level domain for the identified type of encoding; (iii) the DNS request is sent to the DNS server at the root level. Prior to sending the DNS request, the numeric sequence of the domain name of the identified encoding type is converted to a DNS encoding type (i.e., ASCII or unicode standard or other commonly used encoding available in the future) compatible with the DNS protocol. In one efficient embodiment, this conversion occurs in two operations: (i) converting the number sequence of the domain name of the identified code type into a universal language code type; and (ii) converting the numeric sequence of the domain name from a generic language coding type to a DNS coding type compatible with the DNS protocol.

The invention also provides a mapping table that associates specific language code types with specific numeric sequences. The mapping table includes a plurality of records, each including the following attributes: (a) a known sequence of digits of a pre-specified portion of a numerically represented domain name; and (b) a language encoding type associated with the known sequence of digits. The pre-specified portion of the numerically represented domain name may be a numerical sequence of a root level domain of the domain name. These records may also include a top level DNS server that is responsible for resolving the root level domain of the language-coded type in the record. In addition, the mapping table may specify the type of translation that is required to translate the domain name from one non-DNS encoded type to one DNS compatible encoded type (i.e., UTF-5).

The invention also relates to a device having the following features: (a) one or more processors; (b) a memory coupled to the at least one or more processors; and (c) one or more network interfaces that can receive a first DNS request that includes a domain name of a non-DNS-encoding type and send a DNS request with the domain name under the DNS-encoding type compatible with the DNS protocol, wherein at least one of the at least one or more processors is designed or configured to convert the domain name of the non-DNS-encoding type to that domain name of the DNS-encoding type. Wherein the one or more network interfaces are coupled to a network in a manner that enables the device to receive a client DNS request, wherein the client DNS request is a domain name that represents a non-DNS encoded type. And one or more network interfaces coupled to a network in a manner that causes the device to send a DNS request to a standard DNS server, wherein the DNS request is a domain name indicating a DNS encoding type.

The apparatus preferably further comprises a mapping table (preferably one described above) residing at least partially in memory, wherein the mapping table associates specific language coding types with specific sequences of digits that are desired to be found in the digitally encoded domain name.

These features and advantages of the present invention will be described in more detail with reference to the accompanying drawings.

Drawings

Fig. 1 shows schematically a network architecture comprising an iDNS between a DNS server and a client.

Fig. 2 is a flow diagram illustrating a DNS request processing for resolving a domain name that represents a non-DNS-encoded type in accordance with one embodiment of the present invention.

Fig. 3A is a flow diagram of converting a domain name of a non-DNS-encoding type to a domain name of a corresponding DNS-encoding type.

Fig. 3B is a logical configuration of an iDNS system.

Fig. 4 is a flow chart describing a process for determining the encoding type of a domain name.

Fig. 5 is an example of a logical mapping table identifying the encoding type of a domain name according to one embodiment of the invention.

FIG. 6 is a "tree" diagram depicting a hierarchy of Chinese encodings.

Fig. 7 is a block diagram of a general-purpose computer system that can be used to implement the iDNS functionality of the present invention.

Detailed Description

DNS and Unicode standards

The present invention converts multilingual script names into a format compatible with DNS (i.e., DNS as explained in 1999 in RFC 1035). These translated names may be forwarded as DNS queries to a conventional DNS server. Fig. 1 below shows the process flow of how a localized domain name is resolved into its numeric IP address. However, before describing FIG. 1, some background principles and terminology will be discussed.

Programs typically do not access the host through a binary network address. Instead of binary digits, they use ASCII strings, such as www.pobox.org.sg. Of course, the network itself only understands the binary address, and thus some mechanism is required to translate the ASCII string to a network address. This mechanism is provided by the domain name system.

The basic principle of DNS is a hierarchical, domain-naming based scheme and a distributed database system for implementing this naming scheme. It is used primarily to map host names and email destinations to IP addresses, but may be used for other purposes as well. As described above, DNS is defined by RFC1034 and 1035.

Briefly, the DNS is used as follows. To map a name to an IP address, the application calls a library program called "translator" (resolver) and passes the name as a parameter to it. The translator sends a UDP packet to a local DNS server, which then looks up the name and returns the IP address to the translator, which then returns it to the caller. Using the IP address at hand, the program can establish a TCP connection with the destination or send a UDP packet to it.

Typically, the internet is divided into a number of top-level "domains," each of which covers multiple hosts. Each domain is divided into a number of subdomains and they are further subdivided and so on. All of these fields can be represented by a tree. Leaf nodes of the tree represent domains without subdomains (and of course not machines). A leaf domain may include a single host or may represent a company that includes thousands of hosts.

The top level domain has two implications: class and country. Class domains are com (business), edu (educational institution), gov (government), int (some international organization), mil (military), net (network provider), and org (organization). The international domain includes one portal per country as defined in ISO 3166. Each domain is named by the path from it up to the unnamed root node. The components are separated by periods (read as "dots").

In theory, a domain may be inserted into a tree in two ways. E.g., cs.ucb.edu may be listed equally under us country domain, e.g., cs.ucb.ct.us. In practice, however, almost all organizations in the united states are under one class domain, and all organizations in countries outside almost all of the united states are under their country domains. There is no rule against registering under both top-level domains, but doing so may cause confusion, so few organizations use it.

Each domain controls how the domain under its name is assigned. For example, japan has domains ac.jp and co.jp to mirror edu and com. To create a new domain, the new domain is to be licensed in the domain to which it belongs. For example, if university of berkeley, california starts a group of artificial intelligence and wants to call ai.cs.ucb.edu, it needs to be licensed by the manager managing cs.ucb.edu. Similarly, if a new university, such as Lake Tahoe university, is set up, it must request that the administrator of the edu domain assign it ulth. In this way, name conflicts are avoided and each domain can track all of its subdomains. Once a new domain is created and registered, it can create its subdomain, e.g., cs.

In theory, at least one individual name server may comprise the entire DNS database and respond to all queries thereto. In practice, however, such a server would become an unusable machine due to overloading. Further, if it stops working, the entire internet will be paralyzed. To avoid problems caused by having only one information source, the DNS namespaces are partitioned into non-overlapping "zones". Each zone includes portions of the tree and also includes a name server that maintains authorization information about the zone. Typically, an area has a primary name server that obtains its own information from a file on its disk, and one or more secondary servers that obtain its own information from its primary name server.

When a translator obtains a query for a domain name, it sends the query to a local name server. If the domain to be looked up is within the jurisdiction of the name server, e.g., ai.cs.ucb.edu under cs.ucb.edu, then resource records authorized by the server are returned. An authorization record comes from the authority that manages the record and is therefore always correct. A given name server may also include "cached records," which may have become outdated.

If the domain of interest is remote and no information is available locally about the requesting domain, the name server sends a query message to the top-level name server of the requested domain. For example, a local name server attempting to find the IP address of ai.cs.ucb.edu may send a UDP packet to the server with edu in its database, edu-server.net. This server may not know the address of ai.cs.ucb.edu and may also not know cs.ucb.edu, but it must know all its child nodes so that it forwards the request to the name server of ucb.edu. In turn, the server forwards the request to cs.ucb.edu, which must have an authorized resource record. Since each request is from one client to one server, the requested authorization record is returned to the original name server requesting the IP address of ai.

Once the record is returned to the original name server, the server enters it into a cache for later use. However, this information is not authoritative, as the modifications made in cs. Entries in the cache should be frequently deleted or updated. This operation may be implemented with one "time _ to _ live" field included in each record.

The above example of a method for resolving a domain name is called recursive query. Other techniques may also be used. For more details about DNS, see Andrew S.Tanenbaum, "Computer networks," 3^rdEd, Prentice Hall, Upper Saddle River, NJ (1996), which describes the above. Also, see U.D. Black, "TCP/IP and Related Protocols," 3^rdEd., McGraw-Hill, SanFrancisco, Calif. (1998). Both of these documents are incorporated herein by reference.

As mentioned above, the DNS protocol is currently based on an ASCII subset, and therefore it is limited to the latin alphabet. Other codes provide a digital representation of other character sets in the world. Examples of these include: BIG5 and GB-2312 for Chinese character scripts (which represent traditional and simplified characters, respectively), Shift-JIS and EUC-JP for Japanese character scripts, KSC-5601 for Korean character scripts, and extended ASCII characters for French and German characters.

In addition to these language-specific encoding types, there is a Unicode standard (a "universal language encoding type") that has the ability to encode all of the characters used in written languages around the world. It uses 16-bit encoding so that codes can be provided for over 65,000 characters. Unicode standard scripts include Latin, Greek, Amaniya, Hebrew, Arabic, Sanskrit, Bengali, Gurmukhi, Gigillart, Orlyya, Tamil, Tilu Weeku, Enader, Malaysia, Thia, Laos, Grouji, Tibetan, Japanese kana, a complete set of modern Korean Hangul, a Uniset of Chinese/Japanese/Korean (CJK) ideographs. Other more scripts and characters are added briefly, including Elliopsis, Canada, Sylabics, Cherotky, additional rare ideographs, monk, Syrias, Burmese, Gagossypium, Braille.

A 16-bit number is assigned to a code element defined by the unicode standard. Each 16-digit number is called a code value and, when referenced in text, they are represented in 16-ary format following the prefix "U". For example, the code value U +0041 is a 16-ary number 0041 (equal to number 65). It represents the character "a" in the unicode standard.

Each character is also assigned a unique name that identifies it. For example, U +0041 is assigned the character name "LATIN CAPITAL LETTER A", and U +0A1B is assigned the character name "GURMUKHI LETTER CHA". These unicode designations correspond to the ISO/IEC10646 designations for the same characters.

The unicode standard groups characters with scripts by using code blocks. A script may be any system of related characters. The standard preserves the possible endianness in the source set. When the characters of a script are arranged conventionally in a certain order, such as alphabetical order, the unicode standard uses the same order to arrange them in its code space whenever possible. Code blocks vary greatly in size. For example, a code block in amanian language does not exceed 256 code values, while a CJK code block has tens of thousands of code values.

Code elements are logically grouped throughout the code value range (referred to as code space), encoding starting with U +0000 in standard ASCII characters, followed by greek, amazonia, hebrew, arabic, indian, and other scripts; followed by symbols and punctuation. Followed in code space are hiragana, katakana and bopomofo. The complete set of modern Hangul is followed by the unified chinese ideograph. Alternative ranges of code values are reserved for future expansion using UTF-16. In the end part of the code space, following the compatible character, is a range of code values reserved for exclusive use. Compatible characters are some of the character variables that are used only for encoding to enable transcoding to older standards and older implementations using them.

The character encoding standard defines not only the identity of each character and its numerical value or code position, but also how these values are represented in bits. The Unicode standard recognizes at least three formats corresponding to ISO10646 conversion formats UTF-7, UTF-8, and UTF-16.

ISO10646 converts the formats UTF-7, UTF-8 and UTF-16 to the substantive methods of converting the encoding to the actual characters used in the implementation. UTF-16 assumes 16-bit characters and allows a range of characters to be used as an extension mechanism to access additional million characters using 16-bit character pairs. The Unicode Standard, version2.0, Addison Wesley Longman (1996) (modified and added by "The Unicode Standard, version 2.0") employs this conversion format of ISO/IEC 10646. Which is again incorporated by reference.

The second conversion format is called UTF-8. This is a method of converting all unicode standard characters into variable byte length codes. This has the advantage that unicode standard characters corresponding to the familiar ASCII set have the same byte value as ASCII. The unicode characters converted to UTF-8 can be used in existing software without rewriting the software. The Union of Unicode standards also recognizes the use of UTF-8 and as a way to implement Unicode standards. Any unicode characters represented in 16-bit UTF-16 format can be converted to UTF-8 format and vice versa without loss of information. To achieve consistency in the principles of the standard and the coding architecture embodied therein, the unicode standard specifies some explicit requirements. A consistent implementation has the following characteristics, as minimum requirements:

the character is in 16-bit units;

the characters are interpreted by the aid of Unicode standard semantics;

unallocated code is unused;

unknown characters are not misused.

UTF-8 of the Unicode standard is consistent to implement as long as UTF-8 implements each UTF-8 encoding of the Unicode standard characters (byte order) as a corresponding 16-bit unit and interprets the characters according to the Unicode standard specification. All consistency requirements are found in The Unicode Standard, Version2.0, Addison WesleyLongman (1996), which has been incorporated herein by reference. UTF-7 is designed to provide 7-bit characters useful for 7-bit media/transmission. The email specified by RFC822 is a 7-bit system. UTF-16 is designated for 16 bit media/transport and UTF-8 is designated for 8 bit media/transport. Most of the internet is 8-bit transferable, but there are also traditional systems (i.e., DNS, SMTP email, etc.) that use 7 bits.

2. Term(s) for

Some of the terms used herein are not commonly used in the art. Other terms have a variety of meanings. The following definitions are therefore provided as an aid to understanding the following description. The invention in the claims is not limited to these definitions.

Language encoding type-any character or symbol encoding type now known or used in the future (e.g., ASCII or BIG 5).

Universal language coding type-any language coding type now known or developed in the future, including more than one character or symbol set within its coding scope. The Unicode standard is an example, and BIG5, ISO-8859-11 and GB-2312 are other examples.

Numerical representation — a method of representing characters as a result of encoding (e.g., in a bit stream, 16-ary format, etc.).

Numerical order- -a special order of 1 and 0, 16-ary character, or other numerical representation.

The numbers represent the "part" of the domain name-any part or all of the domain name; e.g., top level domain, second level domain, top level and second level domain.

"known" number sequence — a number sequence of interest because it is known to be associated with some commonly used character combination (or other characteristic of a domain name) encoded with some particular encoding type (e.g., ". com" BIG5 number sequence).

A "set" of known number sequences-an arbitrary arrangement or connection between a plurality of known number sequences. Typically, although not necessarily, are stored logically together as a table (e.g., a "mapping table" as described herein).

DNS code type-a code type supported by the DNS protocol of the network or internet, such as a limited set of ASCII as specified by RFC 1035.

non-DNS coding types-coding types not supported by the DNS protocol under consideration, such as BIG5 under RFC 1035.

Implementation of iDNS

Turning now to fig. 1, the important components of a network 10 for use in one embodiment of the present invention include a client 12, a corresponding node 14 with which the client 12 wishes to communicate, an iDNS server 16 and a conventional DNS server 18. The iDNS server 16 can be used at one DNS port (currently addressed to Domain name Port 53) to perform multilingual domain name queries in place of the usual DNS servers, which may include the Berkeley Internet Domain name Server ('named' BIND 'and its executable version'), a widely used DNS server written by Paul Vixie(http://WWW.isc.org/)。

To understand the tasks of these components, assume that a chinese student, using client 12, wants to query the recruitment of a company in hong kong, which operates the corresponding node 14. The student communicates with the company in advance to obtain his domain name. The domain name is provided in native Chinese characters. The client 12 is provided with a keyboard for entering chinese characters and software for recognizing the encoded chinese characters and displaying them accurately on a computer screen.

The student now prepares a message to the hong kong company and attaches his resume, she enters the company's chinese domain name as the destination. When she instructs a client 12 to send a message to a corresponding node 14, the system shown in fig. 1 performs the following operations. First, a domain name of a corresponding node composed of a native language is submitted to the iDNS server 16 by a DNS request. The iDNS server 16 recognizes that the domain name is not in a format that can be handled by a conventional DNS server. It thus converts the chinese domain name into a format (usually a limited set of ASCII characters) that can be used by conventional DNS servers. The iDNS server 16 then repackages the DNS request with the translated corresponding node domain name and sends that request to the regular DNS server 18. DNS server 18 then uses normal DNS protocols to obtain the network address of the domain name it received in the DNS request. The resulting network address is the network address of the corresponding node 14. The DNS server 18 packages that network address according to conventional DNS protocols and forwards it back to the iDNS server 15. The iDNS server 16 then sends the required network address back to the client 12 and the address is put into the student's message. The message is packetized with each packet having a destination network address corresponding to node 14. Client 12 then sends the message packet to node 14 via the internet.

This process will be more fully understood by the exchange process flow diagram of fig. 2. As shown, the client 12 is shown with one vertical line on the left side of the figure, the iDNS server 16 is shown with one vertical line in the center of the figure, and the DNS server 18 is shown with one vertical line on the right side of the figure.

First, at 203, an application running on the client 12 generates a message to a network destination whose domain name is entered using a non-DNS compatible text encoding format. Such that the text is encoded with a language coding type that numerically represents the characters of the text. As described, ASCII is only one language encoding type. In the preferred embodiment, some types that are widely used also include GB2312, GIG5, Shift-JIS, EUC-JP, KSC5601, extended ASCII, etc.

After the client application generates a message at 203, the client operating system generates a DNS request to resolve the domain name at 205. The DNS request may be similar to a conventional DNS request in many respects. But the domain name provided in the request will be provided in a non-DNS encoded format. At 207, the client operating system sends its DNS request to the iDNS server 16. Note that the client operating system may be configured to send DNS requests to the iDNS server 16. In other words, the default DNS server for the client 12 is the iDNS server 16.

At 209, the iDNS server 16 extracts the encoded domain name from the DNS request and generates a translated DNS request that represents the domain name in a DNS-compatible encoding format (currently reduced ASCII as specified by RFC 1035). Next, at 211, the iDNS server 16 sends its DNS request to the regular DNS name server 18. At 213, the name server obtains the IP address of the domain name used in the client communication using conventional DNS protocols. At 215, the name server returns the requested IP address to the iDNS. At 217, the iDNS server 16 returns the IP address to the client 12. Finally, at 219, the client sends its message to the destination using the IP address at hand.

As mentioned above, domain names must be converted from a non-DNS encoded type to a DNS-compatible encoded type in some places. In the above example, this is done with a proxy iDNS server. But is not limited thereto and the functionality required to make this conversion may also be included in the client or in a conventional DNS server.

In an alternative embodiment, the functions performed by the proxy iDNS server are performed in whole (or in part) on the client and/or DNS server. In one embodiment, the operations (operation 305 and 311 of FIG. 3A, described below) implemented on an Internet application (e.g., a Web browser that can support multiple languages) include detecting an encoding type, converting a non-DNS encoded domain to a DNS encoded domain name, and identifying a default name server. In this embodiment, code detection and code conversion is performed automatically prior to distributing a DNS resolution request to a DNS server. In some embodiments, the application may provide automatically defined language coding that avoids code detection.

In another alternative embodiment, operation 305 and 311 may be implemented on an iDNS server. Other embodiments include distributing all or a portion of the operations of the proxy iDNS to DNS servers. For example, the codes of some iDNS functions can be assigned as BIND codes as compatible modules.

In fig. 2, the operation of converting the domain name from one language code type to a second language code type (compatible with DNS) is performed at 209. As shown in fig. 3A, the conversion may be performed by process 301, in accordance with a preferred embodiment of the present invention. The process begins at 303 with the system identifying the encoding type of the domain name in the DNS request. This operation is necessary when the system may be faced with a variety of different coding types. After the encoding type is identified, the system then determines 305 whether the domain name is encoded with a DNS compatible encoding type. Currently, it is required to determine whether a domain name is encoded with a reduced set encoding type of ASCII. If so, no further conversion is needed and process control flows to 311, described below.

In an interesting example, the domain name is encoded in a non-DNS format. So process control passes to 307 where the system converts the domain name to a generic type of encoding. In a preferred embodiment, the common encoding type is the Unicode standard. In this case, the characters represented by the native language encoding type are represented by the unicode standard and converted into a unicode standard sequence.

The newly converted domain name is then further converted from a unicode standard type to a DNS compatible encoding type, such as 309. Thus, the final encoding type may be an ASCII reduced set. Note that the conversion from a DNS-incompatible format to a DNS-compatible format goes through two steps with one generic encoding type in between. These two steps will be described in detail below. However, it should be understood that a step may also be used directly to convert from a DNS-incompatible domain name to a DNS-compatible domain name. This may be implemented in a system having multiple conversion algorithms, where each algorithm is used to convert one specific encoding type to ASCII (or other future DNS compatible encoding type). In one example, these algorithms can obtain templates in the "d ü rstal Gorithm" mentioned above. Many other suitable algorithms are known or may be developed.

Now that there is already a DNS compatible domain name at hand, the system need only determine to which conventional DNS name server to forward the domain name. According to typical DNS protocols, DNS requests may be forwarded to a top-level name server. It is convenient to have a root name server that handles different language domains, as will be described in detail below. For example, the chinese government may maintain a root name server for chinese domain names, the japanese government or a japanese company may maintain a root name server for japanese domain names, the indian government may maintain a root name server for indian domain names, etc. In either case, as shown in FIG. 3A, the system must identify the appropriate name server at 311. After this operation is completed, the conversion process is completed and the DNS request can be sent to the DNS system for processing in accordance with the present invention.

Preferably, the process shown in fig. 3A is performed at only one iDNS server. However, some of the processing may be performed at a client or a conventional DNS server. For example, 303 and 305 may be executed on a client and 309 may be executed on a conventional DNS server.

The preferred work assignment for the iDNS function (327) is shown in FIG. 3B, where an iDNS mapper server 321 performs operations 305 and 311 as shown. Here it includes a mapping table (such as the example described with reference to fig. 5) and converts the coding types of all languages to unicode standards (or other suitable common coding types). In this embodiment, a client 325 performs operation 303 and a conventional DNS server 323 performs standard DNS resolution protocols.

In one implementation, the iDNS mapper server 321 runs on a machine (e.g., denoted as i2.i-dns. com) that specifies a port (e.g., port number 2000). It accepts all of the domain names numerically represented by any language coding type and returns all of the domain names numerically represented by the character coding standard converted to one DNS coding type (UTF-5). Note that the mapping table and translator code may be large, increasing the number of folders in the DNS server 323 (if implemented there). By separating operation 305 from the DNS protocol 311, the amount of code required to assign iDNS can be reduced.

As shown in fig. 3A, when the system has to handle a large number of coding types, it has to be able to distinguish the coding types. The process is as depicted in block 303 and illustrated in fig. 4.

As shown in fig. 4, beginning the process of identifying the encoding type 401 at 403, the system identifies the number sequence of the top level domain of the domain name. The top level fields in the system at month 3 1999 include com,. edu,. goV,. mil,. org,. int,. net, and the different two-letter country identifiers (e.g.. fr,. sg,. kr, etc.).

After the number sequence of the top-level field is identified, the system then matches the sequence to a specific code type. In a preferred embodiment, it includes matching 405 the sequence to a record in a mapping table. An exemplary mapping table is described in detail below. Now simply consider that the table (or other logical structure) includes a list of numerical sequences for different top-level fields of different language coding types handled by the system. Each individual record also includes an associated encoding type identifier. The system matches a sequence of numbers under consideration by simply comparing the sequence to the sequences in the records of the mapping table (using an identified database lookup process such as binary search, hash table, B-tree, etc.). This will generally provide a single match. But if there are multiple entries that can publish top level fields (e.g., each for a different language), the number sequences for the two top level fields of different encoding formats may be the same.

With this possibility in mind, the system determines 407 whether the plurality of records matches the sequence of digits under consideration. If there is no match, the process ends 413 with the system determining to use the code identified in the individual match record. If two or more records match, the system must resolve this ambiguity. At 409, a lower level domain (e.g., a subdomain such as a second level domain) is first identified. In other words, the domain name under consideration will have a sequence of numbers associated with a domain lower in rank than it. The extended number sequence is now matched again to the number sequence in the mapping table (405). Note that some records in the table may include a sequence of numbers that combines a top-level domain and a lower-level domain (addressing potential ambiguities in the sequence of the top-level domain). After a match is found at 405, processing proceeds to 407 as described above.

In an alternative embodiment, only the number sequence of the top-level field is maintained in the mapping table. No processing is done for the extended sequence to resolve ambiguity. In this case, when the 407 answer is positive (multiple record matches), the system identifies each potential match (subsequent encoding type). The sequence to be considered is then decoded using each potential coding type. For example, a root-domain number sequence might find a match for net in one of the Japanese coding types and a match for com in one of the Chinese coding types.

A decoded string should be understandable in a subsequent encoding type language. The other should be hash information. The system thus selects the subsequent coding type that best decodes the secondary domain. The process then ends at 413 with the system using the selected encoding type.

As depicted at 405 of fig. 4, the iDNS server may match known numbers of multiple encoding types for the number sequence of the top level domain of a domain name query. A mapping table may have a known sequence of numbers. Fig. 5 provides a mapping table 501 according to an embodiment of the invention. Each record in table 501 specifies a minimum code decomposition string (e.g., a top level field) for a particular encoding type (e.g., com of GIG 5).

As shown, mapping table 501 includes 6 fields, the first of which is a valid time to indicate how long it will be before the entry cache fails. The min-code resolution string field then identifies a portion of the numeric sequence (e.g., the numeric encoding of. com in GIG 5) of a domain name. Note that the minimum code decomposition string is typically an 8-bit binary string. To simplify entry and maintenance of the minimum code decomposition string in table 501, the binary string is also converted to obtain the format shown.

Although the minimum code decomposition string may generally be the top level domain, of course, it is not so limited. For some language codes, it may be necessary to include a second or higher level field, due to ambiguity, to uniquely resolve the type of code given in the string. Similarly, it may not be necessary to use the entire top-level domain to uniquely determine the encoding type often. This will speed up the search for matches.

The "authority" specified in the table is an organization that has authority over the determination of the domain name specified in the record. For example, an "i-dns" organization is authoritative for the. com of BIG5, which may have the right to issue all sub-domain names under BIG5. com. This ensures that only unique domain names are assigned. Of course, to implement such authoritativeness, an organization is instructed to control the "authoritative" record with a name server (or servers) to assign IP addresses to domain names within the authoritative scope of the DNS space. The "encode" field of table 501 indicates the encoding type of the domain name that matches the record. The "convert" field indicates the encoding type of the domain name that matches the record. UTF-5, for example, is the Durst algorithm (described below) applied to Unicode notation. The last "comment" field includes a text string that identifies what portion of a domain name corresponds to the minimum code resolution string. FIG. 6 illustrates an example domain name tree for decomposing Chinese domain names. An iDNS server detects chinese code types and is configured as a default name server to resolve a domain name. As shown in fig. 6, there are multiple top-level domains (e.g.,. com,. edu,. sg, etc.) under the root. Under the sg top level domain, there are multiple chinese secondary domains such as edu.sg, under which there are multiple domains, including nus. Com, there are multiple chinese level domains, e.g., email.

As shown in the embodiment of fig. 3A, the iDNS system converts a generic coding type (e.g., unicode standard) of a domain name into one DNS coding type. In one useful embodiment, this is accomplished by a conversion algorithm defined by the Internet draft, "International of domain Names," Martin Durst, which is incorporated herein by reference. The algorithm converts a variable-length data entity into a format that includes only RFC-compliant ASCII pure alphabets and numbers. The following table shows a conversion table used in the internet draft.

The first two columns of the table are binary (hexadecimal) values and the last two columns are ASCIIRFC1035 compatible characters. "initial" and "subsequent" refer to the initial nibble of the data entity and the remainder of the data entity, respectively. If a data entity is two bytes long (like UCS-2), there are 4 nibbles in that particular data entity.

As described above, to resolve a multi-lingual domain name, a client application will submit a multi-lingual non-RFC compliant query to an iDNS proxy server. The proxy server then converts the query to an RFC compatible format using the conversion algorithm and submits the query to the one DNS server.

At the DNS server, there is an entry for the RFC compliant query that matches a valid IP address, e.g.

U4B8O7E7RBB4U7BDP1.U696RO5OAAOU59DQ1 IN A 12.34.56.78

The DNS server returns the IP address to the iDNS proxy according to RFC 1035. The proxy server forwards the message including the correctly resolved IP address to the client. Note that the converted domain names (ASCII) must typically be registered by an authority responsible for controlling and publishing the conventional DNS domain names.

Embodiments of the present invention relate to an apparatus for performing the above-described iDNS operations. The apparatus may be specially constructed for the required purposes (designed) or it may be a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. The processes described herein are not specifically designed for a particular computer or other apparatus. In particular, various general-purpose machines may use programs written in accordance with the teachings of the present invention. In addition, it will be more convenient to construct a more specialized apparatus to perform the required method steps. The required structure for a variety of these machines will appear more clearly from the description above.

Additionally, embodiments of the present invention further relate to computer-readable media that include program instructions for performing various computer-executable operations. The media may also include program instructions, data files, data structures, tables, etc., or a combination thereof. The media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available in the computer software arts. Examples of the computer readable medium include magnetic media such as a hard disk, a floppy disk, and a magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as magneto-optical disks; and hardware specially configured to store and execute program instructions, such as Read Only Memory (ROM) and Random Access Memory (RAM). The medium may also be a transmission medium such as optical or metallic lines, wave guides, etc., including a carrier wave or the like for transmitting signals specifying the program instructions, data structures, etc. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.

FIG. 7 illustrates a typical computer system according to an embodiment of the invention. The computer system 700 includes any number of processors 702 (also referred to as central processing units, or CPUs) that can be coupled to storage devices, including a main memory 703 (typically a random access memory, or RAM), a main memory 704 (typically a read-only memory, or ROM). As is known in the art, main memory 704 is typically used for transferring data and instructions uni-directionally to the CPU and main memory 706 is used for transferring data and instructions in a bi-directional manner. Both of these main memories may include any suitable type of the computer-readable media described above. A mass storage device 708 is also coupled bi-directionally to CPU702 and provides additional data storage capacity and may include any suitable type of computer-readable media as described above. The mass storage device 708 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk that is slower than primary storage. It will be appreciated that the information retained within mass storage device 708, may, in appropriate cases, be part of main memory 706 in a standard fashion as virtual memory.

One particular mass storage device is, for example, a CD-ROM714 that may also pass data uni-directionally to the CPU.

CPU702 is also coupled to an interface 710, which may include one or more input/output devices such as a video monitor, trackball, mouse, keyboard, microphone, touch-sensitive display screen, sensor card reader, magnetic or paper tape reader, tablet, stylus, voice or handwriting recognizer, or other known input devices such as, of course, other computers. Finally, CPU702 may optionally be coupled to a computer or communications network using a network connection as shown at 712. From this network connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps. The above-described devices and materials will be familiar to those of skill in the computer hardware and software arts.

The hardware elements described above may be configured to execute one or more software modules that perform the operations of the present invention. Instructions such as detecting an encoding type, converting the encoding type, and identifying a default name server may be stored on mass storage device 708 or 714 and executed by CPU708 in conjunction with main memory 706.

Although the foregoing invention has been described in some detail for purposes of clarity of understanding, certain changes and modifications are possible within the scope of the appended claims.

Claims

1. A method implemented on a device for detecting a language encoding type of a digitally represented domain name, the method comprising:

receiving a numerical sequence of a pre-specified portion of a numerically represented domain name;

matching the number sequences in the domain name with known number sequences in a set of known number sequences, wherein each known number sequence is associated with a particular language coding type and the set includes known number sequences of at least two different language coding types; and

a type of language encoding associated with a known number sequence that matches the number sequence in the domain name is identified.

2. The method of claim 1, further comprising receiving a DNS request containing the digitally represented domain name.

3. The method of claim 1, wherein the pre-specified portion of the digitally represented domain name is a minimum code resolution string in the domain name.

4. The method of claim 1, further comprising converting the format of the digit sequence prior to matching the digit sequence of the digitally represented domain name.

5. The method of claim 1, wherein the set of known number sequences is located in a table containing records having attributes including the known number sequences and the encoding type.

6. The method of claim 5, wherein the table contains records having at least the following coding types: ASCII, BIG5, GB2312, shift-JIS, EUC-JP, KSC5601, and ASCII extended.

7. The method of claim 1, wherein the at least two known digit sequences match digit sequences in a domain name, and the method further comprises:

receiving a sequence of digits of a second portion of a digitally represented domain name; and

the number sequences of the second portion are matched to known number sequences in the set of known number sequences.

8. An apparatus for detecting a language encoding type of a digitally represented domain name, the apparatus comprising:

means for receiving a numerical sequence of a pre-specified portion of a numerically represented domain name;

means for matching the number sequences in the domain name with known number sequences in a collection of known number sequences, wherein each known number sequence is associated with a particular language coding type and the collection includes known number sequences of at least two different language coding types; and

means for identifying the type of language code associated with a known number sequence that matches the number sequence in the domain name.

9. The apparatus of claim 8, further comprising means for receiving a DNS request.

10. The apparatus of claim 8, further comprising means for receiving a DNS request containing the digitally represented domain name.

11. The apparatus of claim 8, wherein the pre-specified portion of the digitally represented domain name is a minimum code resolution string in the domain name.

12. A method of resolving domain names in a multilingual domain name system, comprising:

matching the number sequences in the domain name with known number sequences in a set of known number sequences, wherein each known number sequence is associated with a particular language coding type and the set includes known number sequences of at least two different language coding types;

identifying a language code type associated with a known number sequence that matches the number sequence of the domain name;

identifying an appropriate name server to which the domain name is to be forwarded based on the identified type of language coding; and

the domain name is forwarded to the name server for resolution.

13. The method of claim 12, wherein the set of known digit sequences is located in a logical structure that includes attributes comprising the known digit sequences and the encoding type.

14. The method of claim 13, wherein the logical structure comprises records having at least two of the following coding types: ASCII, BIG5, GB2312, shift-JIS, EUC-JP, KSC5601, and ASCII extended.

15. The method of claim 12, further comprising: a top level DNS server for a root level domain responsible for resolving the detected encoding type is identified.

16. An apparatus for resolving domain names in a multilingual domain name system, the apparatus comprising:

means for matching the number sequences in the domain name with known number sequences in a collection of known number sequences, wherein each known number sequence is associated with a particular language coding type and the collection includes known number sequences of at least two different language coding types;

means for identifying a language code type associated with a known number sequence that matches the number sequence of the domain name;

means for identifying an appropriate name server to which the domain name is to be forwarded based on the identified type of language coding; and

and forwarding the domain name to the name server for resolution.

17. The apparatus of claim 16, further comprising means for receiving a DNS request.

18. The apparatus of claim 16, wherein the set of known digit sequences is located in a logical structure that includes attributes comprising the known digit sequences and the encoding type.