URL Parts
Decomposes a given URL into its parts.
For example when the URL http://www.TDD Buddy.com is decomposed into its parts.
Protocol | : http |
Subdomain | : www |
Domain | : TDD Buddy.com |
Port | : 80 (Default for HTTP) |
Path | : '' (Empty in our case) |
Please be sure to handle the following:
- Only top level domains like .com or .net.
- Do not worry second level domains like .co.uk or co.za
- Only the protocols specified in the default ports section below.
- Be sure to deal with local network hostname only cases. E.g. http://localhost
Default Ports
http: 80, https: 443, ftp: 21, sftp: 22
Examples |
---|
URL: http://foo.bar.com/foobar.html Protocol: http Subdomain: foo Domain name: bar.com Port: 80 Path: foobar.html |
URL: https://www.foobar.com:8080/download/install.exe Protocol: https Subdomain: www Domain name: foobar.com Port: 8080 Path: download/installer.exe |
URL: ftp://foo.com:9000/files Protocol: ftp Subdomain: '' (empty string) Domain name: foo.com Port: 9000 Path: files |
URL: https://localhost/index.html#footer Protocol: https Subdomain: '' (empty string) Domain name: localhost Port: 443 Path: index.html |
Hints
Exclude the leading / when handling path. E.g. /download becomes download.
URL Grammar
Below is a EBNF like grammar for a URL as per this kata.
url = protocol "://" [subdomain] host [top-level-domain] [":" port] [path] ["?" parameters] ["#" anchor]
protocol = "http" | "https" | "ftp" | "sftp"
subdomain = alphanumeric string starting with alpha
host = alphanumeric string
top-level-domain = ".com" | ".net" | ".org" | ".int" | ".edu" | ".gov" | ".mil"
port = numeric
path = alphanumeric string
parameters = alphanumeric string
anchor = alphanumeric string