Data Segregation is a concept to separate data view by using different filters and searches so that the data can be in much more understandable form then its normal form. In this regard, Webbee provides some awesome data filtration techniques to the user to filter the data as they want to be.

There are 3 ways that the user can filter the data as per their need.

  1. Left and Right Side Panel Filters
  2. Search Box
  3. Custom Robots

Left and Right Side Panel Filters

There is a panel in all three modes that display the crawl summary to the user and make them able to change the view as per that panel. See the image below.

left right panel data segragation

The box highlighted is the filter. Right now and by-default “All” box is selected which display all the data that is being crawled by the spider. “200 OK” filter, after clicking, will display those webpages which have a 200 status code. Similarly other filter works other than “Crawl Status” area because that area just displays the status about crawl, not the data fetched by application.

Text Based Filtering – Search Boxes

In all the tables a search box can be found at left side of panel. That box takes the text as input and applies it to the selected column and filters the data. See the below image.

search box filter data segregation

Note: Both filters, left right panel and search box filter, can be applied simultaneously to further filter the data. See the image below.

combine filter data segregation

In above image first filter was applied by the panel i.e. 200 OK pages and then a search filter have also implemented by searching the “user-guide” in search box. This can be seen that outcome contain all the pages with 200 OK status code and they also contain “/user-guide” in them as search filter was implemented to “Page URL” column. This can also be observed in “Filtered:” field that filter count consisting on both the filters.

URL Parameter’s Filtration

There can be instances when user requires URL filtration e.g. filter the URL on some parameter basis and do not crawl them. This requirement seems to be ‘never happened case’ but can be very helpful importantly when you have a website with user generated pages i.e. comments on blog, FAQs, search pages etc. Such pages can be a real harm if they are not catered (Disallowed in robots.txt and/or no-index) as they can create huge content duplication across the website and thin content pages as well. In such circumstances identifying those pages can be real tough.

These scenarios can be handled with Webbee’s Custom Robots Feature which is capable of filtering defined parameter’s URL from crawl and store them at separate place so that they can be identified. Parameters can be defined in Custom Robots just like we write a normal robots.txt e.g.

User-Agent: *
Disallow: /*#
Disallow: /*?
Disallow: /*=
# so on

Tip: Identify the maximum parameters and disallow them through custom robots.txt feature and get a perfect crawl.

About Ahmad Ali

Ahmad is the co-founder and CEO at Webbee Inc. He’s been working as a digital marketer for past few years and has worked with some notables names across different industries. He is also the creator of Webbee SEO spider, one of the most advanced SEO spider tool on the internet.

Leave a Comment