Spider Speed and Turbo Crawl
Spider speed option lets the user to increase the speed of the crawler. By default application creates 5 threads which scrap the website data in parallel. But if more speed is required (if computer hardware and internet speed in capable of handling this) then this can be done with spider speed option available in Spider menu. See the image below.
Below is the image that how this option looks like.
The option ‘Connection Timeout’ waits for mentioned seconds when connecting to a URL. The time can be set for this option from 1 – 15 seconds. If in that time span the webpage do not responds then application throws a ‘Connection Timeout Exception’ which is properly handled. Moreover, this option lets the user know the website responding time.
There is an option as well for ‘Turbo Crawl’. This option saves users’ time and crawl the website at turbo speed without manual spider speed specification. It automatically sets the spider speed at turbo.
Requirements for Turbo Crawl
Turbo Crawl requires good hardware specifications for being used. Make sure you have at least ‘Core i’ technology and a good internet connection speed when enabling this option.
Note: When turbo crawl has enabled; all the filters and search options will be unable to access to make sure the data authenticity. These options will be accessible once the crawl gets completed and/or finalized by the user. Option can be found in Spider Menu. Below is a snapshot.
Scenarios where Turbo Crawl is Recommended
Below are some scenarios where turbo crawl is worth using it.
- When deep crawl is required: Make sure that system has enough memory to crawl the maximum URLs from a website.
- If big data is required: Webbee crawls every bit from webpage and sort and schedule it in different files for user. Read the “Download Options and Advance Reports Download” section for more details.
- If the sitemap extraction is required and the targeted website has “big in size sitemaps” and “big count of sitemaps” in its robots.txt. (If robots.txt does not contain the sitemaps URLs then manual input can be used for sitemap crawling. Read Custom Robots section)
- Just header status codes are required for website i.e. how many webpages on website are 200 OK or Redirects and/or 404 not found.