2extract Developer Documentation

Puppeteer is a Node.js library that provides a high-level API to control headless Chrome or Chromium. It’s essential for scraping modern, JavaScript-driven websites (Single Page Applications). Integrating 2extract.com proxies with Puppeteer requires passing a special --proxy-server argument when launching the browser.

Basic Setup

Here’s how to launch a Puppeteer instance that routes all its traffic through our proxy gateway.

You’ll need puppeteer installed in your project: npm install puppeteer

basic_setup.js

const puppeteer = require('puppeteer');

// 1. Get these from your proxy's "Connection Details" page
const proxyHost = "proxy.2extract.net";
const proxyPort = 5555;
const proxyUser = "PROXY_USERNAME";
const proxyPass = "PROXY_PASSWORD";

const proxyServer = `http://${proxyHost}:${proxyPort}`;

(async () => {
  console.log('Launching browser with proxy...');
  const browser = await puppeteer.launch({
    headless: false, // Set to true for production, false for debugging
    // 2. Pass the proxy server URL as an argument
    args: [`--proxy-server=${proxyServer}`]
  });

  const page = await browser.newPage();

  // 3. Authenticate the proxy for this page
  await page.authenticate({
    username: proxyUser,
    password: proxyPass
  });

  console.log('Navigating to IP checker...');
  await page.goto('https://api.ipify.org?format=json', { waitUntil: 'networkidle0' });

  // 4. Get the content and verify the IP
  const content = await page.evaluate(() => document.body.textContent);
  console.log('Success! Your proxy IP is:', JSON.parse(content).ip);

  await browser.close();
})();

Important: Puppeteer requires you to authenticate on a per-page basis using page.authenticate(). You must call this method for every new page (browser.newPage()) you open.

Real-World Example: Taking a Screenshot of Amazon Search Results

A common use case for Puppeteer is to render a full page with dynamic content and take a screenshot, for example, to monitor product rankings or search results on a major eCommerce site like Amazon. Let’s take a screenshot of the search results for “web scraping books” on amazon.com, making the request appear as if it’s coming from the United States.

amazon_scraper.js

const puppeteer = require('puppeteer');

// --- Your Base Credentials ---
const BASE_USERNAME = "PROXY_USERNAME";
const PASSWORD = "PROXY_PASSWORD";
const PROXY_HOST = "proxy.2extract.net";
const PROXY_PORT = 5555;

const proxyServer = `http://${PROXY_HOST}:${PROXY_PORT}`;

// --- Target Information ---
const SEARCH_QUERY = "web scraping books";
const TARGET_URL = `https://www.amazon.com/s?k=${encodeURIComponent(SEARCH_QUERY)}`;
const REGION = "us"; // United States

(async () => {
  console.log("Launching Puppeteer browser...");
  const browser = await puppeteer.launch({
    headless: true, // Use true for automated scripts
    args: [`--proxy-server=${proxyServer}`]
  });

  const page = await browser.newPage();

  // Dynamically construct the username for the target region
  const proxyUsername = `${BASE_USERNAME}-country-${REGION}`;

  // Authenticate using the dynamically created username
  await page.authenticate({
    username: proxyUsername,
    password: PASSWORD
  });

  // Set a realistic viewport and user agent to mimic a real user
  await page.setViewport({ width: 1920, height: 1080 });
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36');

  console.log(`Navigating to Amazon search results for "${SEARCH_QUERY}" via a ${REGION.toUpperCase()} proxy...`);

  try {
    // Navigate to the page and wait for the network to be mostly idle
    await page.goto(TARGET_URL, { waitUntil: 'networkidle2', timeout: 60000 });

    // (Optional) A small delay to ensure all dynamic content loads
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Take a screenshot and save it
    await page.screenshot({ path: `amazon_search_${REGION}.png`, fullPage: true });

    console.log(`Success! Screenshot saved as 'amazon_search_${REGION}.png'`);

  } catch (error) {
    console.error(`An error occurred: ${error.message}`);
    // In case of an error, it's useful to save the HTML for debugging
    const errorHtml = await page.content();
    require('fs').writeFileSync('error.html', errorHtml);
    console.log("Saved the page's HTML to error.html for debugging.");
  } finally {
    await browser.close();
    console.log("Browser closed.");
  }
})();

Amazon is a challenging target! They employ sophisticated anti-bot measures. While this script works, you may need to implement more advanced techniques for large-scale scraping, such as rotating User-Agents, handling CAPTCHAs, and mimicking human-like browsing behavior (e.g., random delays, mouse movements).

What This Example Demonstrates

Scraping a Real, Complex Website: How to approach a major target like Amazon.
Proxy Authentication in Puppeteer: The correct two-step process of setting --proxy-server and using page.authenticate().
Dynamic Geo-Targeting: Shows how to change your perceived location by modifying the username.
Mimicking a Real User: Demonstrates best practices like setting a realistic viewport and User-Agent to reduce the chance of being detected by anti-bot systems.
Error Handling: Includes a try...catch block and saves the page’s HTML on failure, which is a crucial technique for debugging scraping issues.

Introduction

Scraping Frameworks

Anti-Detect Browsers

Proxy Managers & Tools

Node.js: Puppeteer

Basic Setup

Real-World Example: Taking a Screenshot of Amazon Search Results

What This Example Demonstrates

Introduction

Scraping Frameworks

Anti-Detect Browsers

Proxy Managers & Tools

​Basic Setup

​Real-World Example: Taking a Screenshot of Amazon Search Results

​What This Example Demonstrates

Basic Setup

Real-World Example: Taking a Screenshot of Amazon Search Results

What This Example Demonstrates