Filter

The filter tab allows excluding specific pages/data from scrapping, for example, blogs, info pages, etc.

  • Follow ignore lists from robots.txt - it is checked by default, and used for ignoring the links from the robots.txt (Disallow: /cart, Disallow: /orders, Disallow: /checkouts/ )

  • Include URLs (regex rules) - you can manually add the links to the pages for scrapping ( products, items, etc)

  • Exclude URLs (regex rules) - add new rules to skip specific pages from scrapping( login, account, cart, blog, pdf, etc.)

  • Ignore elements in HTML on page - you can exclude the classes from the scrapping (for example id="Footer", to exclude the Footer from scrapping)