Google-shopping
Google Shopping Scraper using Crawlee
Introduction
This project is a web scraper built using Crawlee and Playwright to extract product details from Google Shopping. It automates the process of searching for products and retrieving relevant information such as price, seller, and product URLs.
Features
- Scrapes Google Shopping search results.
- Extracts product details including name, price, and seller information.
- Saves data to structured datasets.
- Handles retries and failed requests efficiently.
- Supports CSV-based product input.
Installation
Before running the scraper, ensure you have Node.js installed. Then, install Crawlee and other dependencies:
npm install crawlee playwright fs csv-parse
Code Implementation
Setting Up the Crawler
We define an EcommerceCrawler
class that manages the scraping process:
import { PlaywrightCrawler, RequestQueue, Dataset } from 'crawlee';
import { readFileSync } from 'fs';
import { parse } from 'csv-parse/sync';
export class EcommerceCrawler {
private isPaused: boolean = false;
async search() {
const router = new PlaywrightCrawler().router;
const requestQueue = await RequestQueue.open();
const searchResultsDataset = await Dataset.open('search_results');
const products = this.loadProductsFromCSV('ts_parsed_data.csv');
}
}
Adding Search Requests
Products from a CSV file are loaded and added to the request queue:
for (const product of products) {
await requestQueue.addRequest({
uniqueKey: product.ean,
url: 'https://www.google.com/?tbm=shop&hl=nl&gl=nl',
userData: { product, label: 'search' },
});
}
Handling Search Results
Once a search is performed, the scraper extracts relevant details:
const products = await page.$$eval('.sh-dgr__content', (cards) => {
return cards.map((card) => {
const nameElement = card.querySelector('h3.tAxDx');
const priceElement = card.querySelector('span.a8Pemb');
const merchantNameElement = card.querySelector('div.aULzUe.IuHnof');
return {
name: nameElement ? nameElement.textContent.trim() : null,
price: priceElement ? priceElement.textContent.trim() : null,
merchant_name: merchantNameElement ? merchantNameElement.textContent.trim() : null,
};
});
});
Handling Failed Requests
Failed requests are logged and stored in a dataset:
const crawler = new PlaywrightCrawler({
requestQueue,
headless: true,
requestHandler: router,
maxRequestRetries: 6,
maxConcurrency: 3,
failedRequestHandler: async ({ request, log }) => {
log.error(`Request ${request.url} failed too many times.`);
const dataset = await Dataset.open('failed_requests');
await dataset.pushData({ url: request.url, retryCount: request.retryCount });
},
});
Running the Scraper
To start scraping, run:
node index.js
The results will be saved in search_results.csv
and seller_prices.csv
.
Conclusion
This Google Shopping scraper efficiently gathers product data and handles search result parsing, making it useful for price monitoring and competitive analysis. 🚀