Data Segregation is a concept to separate data view by using different filters and searches so that the data can be in much more understandable form then its normal form. In this regard, Webbee provides some awesome data filtration techniques to the user to filter the data as they want to be.
There are 3 ways that the user can filter the data as per their need.
- Left and Right Side Panel Filters
- Search Box
- Custom Robots
Left and Right Side Panel Filters
There is a panel in all three modes that display the crawl summary to the user and make them able to change the view as per that panel. See the image below.
The box highlighted is the filter. Right now and by-default “All” box is selected which display all the data that is being crawled by the spider. “200 OK” filter, after clicking, will display those webpages which have a 200 status code. Similarly other filter works other than “Crawl Status” area because that area just displays the status about crawl, not the data fetched by application.
Text Based Filtering – Search Boxes
In all the tables a search box can be found at left side of panel. That box takes the text as input and applies it to the selected column and filters the data. See the below image.
Note: Both filters, left right panel and search box filter, can be applied simultaneously to further filter the data. See the image below.
In above image first filter was applied by the panel i.e. 200 OK pages and then a search filter have also implemented by searching the “user-guide” in search box. This can be seen that outcome contain all the pages with 200 OK status code and they also contain “/user-guide” in them as search filter was implemented to “Page URL” column. This can also be observed in “Filtered:” field that filter count consisting on both the filters.
URL Parameter’s Filtration
There can be instances when user requires URL filtration e.g. filter the URL on some parameter basis and do not crawl them. This requirement seems to be ‘never happened case’ but can be very helpful importantly when you have a website with user generated pages i.e. comments on blog, FAQs, search pages etc. Such pages can be a real harm if they are not catered (Disallowed in robots.txt and/or no-index) as they can create huge content duplication across the website and thin content pages as well. In such circumstances identifying those pages can be real tough.
These scenarios can be handled with Webbee’s Custom Robots Feature which is capable of filtering defined parameter’s URL from crawl and store them at separate place so that they can be identified. Parameters can be defined in Custom Robots just like we write a normal robots.txt e.g.
# so on
Tip: Identify the maximum parameters and disallow them through custom robots.txt feature and get a perfect crawl.