Crawl scope

When defining a web application in the wizard, you must select a crawl scope setting. In case of authenticated scan, ensure that you always put the login link as the first link.

Note: The links crawled based on scope are limited by the Maximum Links To Crawl setting in the Option Profile > Scan Parameters. The crawl scope is honored till the maximum links limit is reached, and the links collected are crawled. When the maximum link limit is reached, the links are collected. However, even if they are in the crawl scope, the links are not crawled.

The following settings are available.

Limit to URL hostname (abc.xyz)

Select this setting to limit crawling to the hostname within the URL, using HTTP or HTTPS and any port. Let's say your starting URL is http://www.example.org/news/. All links discovered in www.example.org domain will be crawled. Also all links discovered in http://www.example.org/support and https://www.example.org:8080/logout will be crawled. No links will be followed from subdomains of www.example.org. This means http://www2.example.org and http://cdn.www.example.org/ will not be crawled.

Limit to content located at or below URL subdirectory

Select this setting to crawl all links starting with a URL subdirectory using HTTP or HTTPS and any port. Let's say your starting URL is http://www.example.org/news/. All links starting with http://www.example.org/news/ will be crawled. Also http://www.example.org/news/headlines and https://www.example.org:8080/news/ will be crawled. Links like http://www.example.org/agenda and http://www2.example.org will not be crawled.

Limit to URL hostname and specified sub-domain

Select this setting to crawl only the URL hostname and one specified sub-domain, using HTTP or HTTPS and any port. Let's say your starting URL is http://www.example.org/news/ and the sub-domain is cdn.example.org. All links discovered in www.example.org and in cdn.example.org and any of its subdomains will be crawled. Also these domains will be crawled: http://www.example.org/support, https://www.example.org:8080/logout, http://cdn.example.org/images/ and http://videos.cdn.example.org. Links whose domain does not match the web application URL hostname or is not a sub-domain of cdn.example.org will not be followed. This means http://videos.example.org will not be crawled.

Limit to URL hostname and specified domains

Select this setting to crawl only the URL hostname and specified domains, using HTTP or HTTPS and any port. Let's say your starting URL is http://www.example.org/news/ and the specified domains are cdn.example.org and site.example.org. All links discovered in www.example.org and in cdn.example.org and all other domains specified will be crawled. This means these domains will be crawled: http://www.example.org/support, https://www.example.org:8080/logout and http://cdn.example.org/images/. Links whose domain does not match web application URL hostname or one of the domains specified will not be followed. This means http://videos.example.org and http://videos.cdn.example.org will not be crawled.