2012 Feb 12
One fundamental issue in designing Numbertron's database is the format of a telephone number.
Here are some examples of phone numbers, as humans talk about them:
What's the best way to represent these? How do you put them all in the same database table?
I chose to normalize them by using the complete E.164 number, including country code and exclusive of any punctuation. Under this scheme, those same numbers are stored as:
When I gather listings from various sources, they often are in national format, sometimes with a 0 or 1 prefix to indicate “long distance” and rarely with the country code prefixed. It can be tricky to find the best way to turn that into an E.164-compatible numbering.
For example, Nepal uses many phone numbers beginning with a 1, meaning “fixed line in Kathmandu”. Nepal's 1 stays there, so its numbers will look like “+977 1 423 1301”. Simple enough.
Conversely, the leading 0 in United Kingdom numbers (country code 44) is always a dialling code and must be removed. In the UK, though, this leading 0 is widely considered to be a part of the area code. Interestingly, it is even written that way in the Wikipedia article that proves some people care way too much about telephone numbers.
The leading 1 in North American numbers, however, is a bit of a special case – it is technically a dialling code, and as such should be stripped from the number before it is normalized, but the country code that we add is a 1 anyway and so it can be left there without causing a problem.
That's all, right? A phone number is a string of digits 0 through 9, beginning with a country code. The country code is one to three digits long and uniquely identifies where the number is located. The number is up to 15 digits long including country code. Right? Well, we can safely assume so. For now.
A more interesting facet of the problem is number formatting, to make numbers more pleasant for people to read. I'll write about that next time.