WebReinvent Internal Docs
Guides

Google-shopping

Google Shopping Scraper using Crawlee

Introduction

This project is a web scraper built using Crawlee and Playwright to extract product details from Google Shopping. It automates the process of searching for products and retrieving relevant information such as price, seller, and product URLs.

Features

  • Scrapes Google Shopping search results.
  • Extracts product details including name, price, and seller information.
  • Saves data to structured datasets.
  • Handles retries and failed requests efficiently.
  • Supports CSV-based product input.

Installation

Before running the scraper, ensure you have Node.js installed. Then, install Crawlee and other dependencies:

npm install crawlee playwright fs csv-parse

Code Implementation

Setting Up the Crawler

We define an EcommerceCrawler class that manages the scraping process:

import { PlaywrightCrawler, RequestQueue, Dataset } from 'crawlee';
import { readFileSync } from 'fs';
import { parse } from 'csv-parse/sync';

export class EcommerceCrawler {
    private isPaused: boolean = false;

    async search() {
        const router = new PlaywrightCrawler().router;
        const requestQueue = await RequestQueue.open();
        const searchResultsDataset = await Dataset.open('search_results');
        const products = this.loadProductsFromCSV('ts_parsed_data.csv');
    }
}

Adding Search Requests

Products from a CSV file are loaded and added to the request queue:

for (const product of products) {
    await requestQueue.addRequest({
        uniqueKey: product.ean,
        url: 'https://www.google.com/?tbm=shop&hl=nl&gl=nl',
        userData: { product, label: 'search' },
    });
}

Handling Search Results

Once a search is performed, the scraper extracts relevant details:

const products = await page.$$eval('.sh-dgr__content', (cards) => {
    return cards.map((card) => {
        const nameElement = card.querySelector('h3.tAxDx');
        const priceElement = card.querySelector('span.a8Pemb');
        const merchantNameElement = card.querySelector('div.aULzUe.IuHnof');
        return {
            name: nameElement ? nameElement.textContent.trim() : null,
            price: priceElement ? priceElement.textContent.trim() : null,
            merchant_name: merchantNameElement ? merchantNameElement.textContent.trim() : null,
        };
    });
});

Handling Failed Requests

Failed requests are logged and stored in a dataset:

const crawler = new PlaywrightCrawler({
    requestQueue,
    headless: true,
    requestHandler: router,
    maxRequestRetries: 6,
    maxConcurrency: 3,
    failedRequestHandler: async ({ request, log }) => {
        log.error(`Request ${request.url} failed too many times.`);
        const dataset = await Dataset.open('failed_requests');
        await dataset.pushData({ url: request.url, retryCount: request.retryCount });
    },
});

Running the Scraper

To start scraping, run:

node index.js

The results will be saved in search_results.csv and seller_prices.csv.

Conclusion

This Google Shopping scraper efficiently gathers product data and handles search result parsing, making it useful for price monitoring and competitive analysis. 🚀


Copyright © 2025