General
To create the first configuration, click on the Create button
In the Next step, you will see the General tab that includes the main settings, including the configuration name and URL of the site that you want to scrape.
You can use one of 3 options:
Start from a single page
Use a file with a list of URLs
Download the URLs from a sitemap
Start from a single page - specify the URL that you want to scrape a data from
Use the file with a lost of the URLs - download the file that contains all the urls that you want to scrape
Download URLs from a sitemap - you can use the sitemap to scrap all the products from the specified site
This section allows setting the following parameters:
Scan depth - is set to detect the depth of scanning the page
*0 - the scrapper will check only the page that is set in the Entry URL field.
*1 - the scrapper will scan all the pages on the 1st level (blogs, accounts, about, delivery, related products, etc.)
*100 - unlimitedNumber of threads - is used to speed up the scrapping process. You can set a few threads to be running at the same time. We'd recommend you use the default parameter - 3, to prevent the ban
Wait up to - you can set the time that the application will wait for the response from the server
IMPORTANT NOTE: Free trial version allows set only 1 thread and Wait Up to - 1 second, and you can scrape up to 100 products
Result Data format - required parameter to detect the format of the data that you will get after the scrapping. Since different shopping carts have different formats of the files for import, the software will prepare the ready-to-use file. At the moment the supported formats are WooCommerce, Magento, Shopify, PrestaShop or you can set the Custom format
The next section is used for the variations scrapping:
Fill variations empty fields*
Export column with row type (product/variation)*
Prevent duplicate variations - it is recommended to check this option if you scrape products with variations to avoid the duplicates
*The first 2 options will be activated if you select a Custom data format
Here you can specify the configuration name - it is filled automatically when you enter the URL, but you can change it if needed.
Note, that this tool allows you to set up separate configurations for different stores, so using the store name, or any specific parameter will help you to identify what configuration was used for which store
You can also add comments, under the Comment section
Advanced settings are used for the specific
User agent - you can specify the agent to identify yourself (ex. Mozilla) or skip this option
Wait up to - set this setting if the page is loaded for a long time. We'd recommend setting 1-5 seconds
Scroll page to the bottom before data parsing - check this option to make sure that the scraper scanned the page to the bottom
Always load images - check this option if you want to upload the images
In most cases, this option is used for Magento stores. Magento uses the fotorama for the Media Gallery, so in order to load all the images, check this option