Go is a fantastic choice for building high-performance, concurrent web scrapers. Integrating 2extract.com proxies is straightforward using the standard library’s net/http
package by configuring a custom http.Transport
.
Basic Setup
The key to using a proxy with authentication in Go is to create a custom http.Client
with a Transport
that has its Proxy
field set.
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"net/url"
)
func main() {
// 1. Get these from your proxy's "Connection Details" page
proxyHost := "proxy.2extract.net"
proxyPort := 5555
proxyUser := "PROXY_USERNAME"
proxyPass := "PROXY_PASSWORD"
// 2. Construct the proxy URL with credentials
proxyURLString := fmt.Sprintf("http://%s:%s@%s:%d", proxyUser, proxyPass, proxyHost, proxyPort)
proxyURL, err := url.Parse(proxyURLString)
if err != nil {
log.Fatalf("Failed to parse proxy URL: %v", err)
}
// 3. Create a custom HTTP transport and set the Proxy
transport := &http.Transport{
Proxy: http.ProxyURL(proxyURL),
}
// 4. Create a custom HTTP client using the transport
client := &http.Client{
Transport: transport,
}
// 5. Make your request!
fmt.Println("Making request to IP checker...")
resp, err := client.Get("https://api.ipify.org?format=json")
if err != nil {
log.Fatalf("Failed to make request: %v", err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalf("Failed to read response body: %v", err)
}
fmt.Printf("Success! Your proxy IP is: %s\n", string(body))
}
Real-World Example: Scraping Hacker News Headlines
Let’s build a simple but efficient Go application to scrape the titles of the top stories from the Hacker News homepage (news.ycombinator.com
). This is a classic scraping target.
This example demonstrates how to create a reusable function to perform the scraping, making it easy to adapt for other targets. We will target Hacker News as if we were in the United Kingdom (GB).
This example uses the goquery
package for easy HTML parsing, similar to jQuery. Install it with:
go get github.com/PuerkitoBio/goquery
package main
import (
"fmt"
"log"
"net/http"
"net/url"
"github.com/PuerkitoBio/goquery"
)
// --- Your Base Credentials ---
const (
BASE_USERNAME = "PROXY_USERNAME"
PASSWORD = "PROXY_PASSWORD"
PROXY_HOST = "proxy.2extract.net"
PROXY_PORT = 5555
)
// createProxyClient creates and returns an http.Client configured with our proxy.
func createProxyClient(region string) (*http.Client, error) {
// Dynamically construct the username for the target region
proxyUser := fmt.Sprintf("%s-country-%s", BASE_USERNAME, region)
proxyURLString := fmt.Sprintf("http://%s:%s@%s:%d", proxyUser, PASSWORD, PROXY_HOST, PROXY_PORT)
proxyURL, err := url.Parse(proxyURLString)
if err != nil {
return nil, fmt.Errorf("failed to parse proxy URL: %w", err)
}
transport := &http.Transport{
Proxy: http.ProxyURL(proxyURL),
}
client := &http.Client{
Transport: transport,
}
return client, nil
}
func main() {
targetRegion := "gb" // Great Britain
fmt.Printf("--- Scraping Hacker News from %s ---\n", targetRegion)
// Create our custom, proxy-enabled HTTP client
client, err := createProxyClient(targetRegion)
if err != nil {
log.Fatal(err)
}
// Make the request to Hacker News
resp, err := client.Get("https://news.ycombinator.com")
if err != nil {
log.Fatalf("Failed to scrape Hacker News: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("Hacker News returned a non-200 status code: %d %s", resp.StatusCode, resp.Status)
}
// Load the HTML document into goquery
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatalf("Failed to parse HTML: %v", err)
}
fmt.Println("Top 5 Headlines:")
// Find the story links and print their titles
doc.Find("tr.athing .titleline > a").Each(func(i int, s *goquery.Selection) {
if i < 5 { // Limit to the top 5 for this example
title := s.Text()
fmt.Printf("%d. %s\n", i+1, title)
}
})
}
How to Run
- Make sure you have Go installed.
- Install the
goquery
package: go get github.com/PuerkitoBio/goquery
- Save the code as
hn_scraper.go
.
- Run from your terminal:
go run hn_scraper.go
Expected Output
--- Scraping Hacker News from gb ---
Top 5 Headlines:
1. Show HN: I built a terminal UI for ChatGPT
2. The unreasonable effectiveness of just showing up everyday
3. Why are manhole covers round?
4. A 16-bit virtual machine from scratch
5. SQLite is not a toy database
This example shows how to structure a Go scraping application cleanly, separating the proxy client creation from the main scraping logic. This pattern can be easily extended for concurrent scraping using goroutines.