URL Parts


Decomposes a given URL into its parts.

For example when the URL http://www.TDD Buddy.com is decomposed into its parts.

Protocol : http
Subdomain : www
Domain : TDD Buddy.com
Port : 80 (Default for HTTP)
Path : '' (Empty in our case)

Please be sure to handle the following:

  • Only top level domains like .com or .net.
    • Do not worry second level domains like .co.uk or co.za
  • Only the protocols specified in the default ports section below.
  • Be sure to deal with local network hostname only cases. E.g. http://localhost
Do not use built-in classes like Uri to solve this.

Default Ports

http: 80, https: 443, ftp: 21, sftp: 22

Examples
URL: http://foo.bar.com/foobar.html
Protocol: http
Subdomain: foo
Domain name: bar.com
Port: 80
Path: foobar.html
URL: https://www.foobar.com:8080/download/install.exe
Protocol: https
Subdomain: www
Domain name: foobar.com
Port: 8080
Path: download/installer.exe
URL: ftp://foo.com:9000/files
Protocol: ftp
Subdomain: '' (empty string)
Domain name: foo.com
Port: 9000
Path: files
URL: https://localhost/index.html#footer
Protocol: https
Subdomain: '' (empty string)
Domain name: localhost
Port: 443
Path: index.html

Hints

Exclude the leading / when handling path. E.g. /download becomes download.

URL Grammar

Below is a EBNF like grammar for a URL as per this kata.

url = protocol "://" [subdomain] host [top-level-domain] [":" port] [path] ["?" parameters] ["#" anchor]
protocol = "http" | "https" | "ftp" | "sftp"
subdomain = alphanumeric string starting with alpha
host = alphanumeric string
top-level-domain = ".com" | ".net" | ".org" | ".int" | ".edu" | ".gov" | ".mil"
port = numeric
path = alphanumeric string
parameters = alphanumeric string
anchor = alphanumeric string