I'm writing a little web crawler, and a lot of the links on sites I'm crawling are relative (so they're /robots.txt, for example). How do I convert these relative URLs to absolute URLs (so /robots.txt => http://google.com/robots.txt)? Does Go have a built-in way to do this?
Asked
Active
Viewed 3,938 times
7
3 Answers
12
Yes, the standard library can do this with the net/url package. Example (from the standard library):
package main
import (
"fmt"
"log"
"net/url"
)
func main() {
u, err := url.Parse("../../..//search?q=dotnet")
if err != nil {
log.Fatal(err)
}
base, err := url.Parse("http://example.com/directory/")
if err != nil {
log.Fatal(err)
}
fmt.Println(base.ResolveReference(u))
}
Notice that you only need to parse the absolute URL once and then you can reuse it over and over.
Not_a_Golfer
- 47,012
- 14
- 126
- 92
-
Thank you @Not_a_Golfer. Great idea. – Svetoslav Marinov Dec 26 '20 at 21:04
5
On top of @Not_a_Golfer's solution.
You can also use base URL's Parse method to provide a relative or absolute URL.
package main
import (
"fmt"
"log"
"net/url"
)
func main() {
// parse only base url
base, err := url.Parse("http://example.com/directory/")
if err != nil {
log.Fatal(err)
}
// and then use it to parse relative URLs
u, err := base.Parse("../../..//search?q=dotnet")
if err != nil {
log.Fatal(err)
}
fmt.Println(u.String())
}
Try it on Go Playground.
KenanBek
- 999
- 1
- 14
- 21
1
I think you are looking for ResolveReference method.
import (
"fmt"
"log"
"net/url"
)
func main() {
u, err := url.Parse("../../..//search?q=dotnet")
if err != nil {
log.Fatal(err)
}
base, err := url.Parse("http://example.com/directory/")
if err != nil {
log.Fatal(err)
}
fmt.Println(base.ResolveReference(u))
}
// gives: http://example.com/search?q=dotnet
I use it for my crawler as well and works like a charm!
Iman Mirzadeh
- 12,710
- 2
- 40
- 44