Seeding Web Application Scanners with Attack Surface Data

We’ve talked previously on this blog about web application attack surface – why it is important to understand all of the URLs an application will respond to and all of the injection points where an application will “listen” for inputs that will impact its behavior. As part of the Phase 1 Hybrid Analysis Mapping (HAM) research we did for the Department of Homeland Security (DHS) we even created and released a command-line tool that will perform a quick static scan of an application’s source code and dump a list of these URLs and injection points out to standard output. This is a handy way to get a quick look at what an application’s attack surface looks like. So we decided to take this a step further – what if you could seed your web application scanner with this sort of data? If the scanner knows what the application’s attack surface is, you will get a much more thorough analysis of the target application.

Why is this? Well – a web application scanner can’t test attack surface if it doesn’t know that the attack surface exists. How do most scanners determine an applications attack surface? They will typically:

  • Spider the application – The web scanner will act like Google does when it indexes pages on the Internet by analyzing a web page’s HTML, JavaScript and other assets to determine links to other pages that have not yet been visited as well as any GET or POST parameters that may be passed to those pages. Scanners will also look for the cookies an application sets so those can become part of the application surface to be tested as well. For applications that require a user to be logged-in in order to access functionality this spidering process must be coupled with login and session-management capabilities in the scanner.
  • Guess – Web scanners can also try to make guesses about additional URLs that might be exposed as well as parameters that can be passed in. These may be guesses at common URLs (like an admin/ directory) or permutations of previously identified objects (like looking for /index.php.bak if a site exposes the URL /index.php).

That’s all well and good, but what about pages and parameters that are missed by these approaches? Elements can be missed for a variety of reasons:

  • Weaknesses in the spider – No scanner is perfect so the spidering process could potentially miss exposed attack surface. URLs might be discoverable by analyzing JavaScript, Flash and other site elements, but not all spiders take these into account.
  • Pages with no inbound links – Complex applications may have all sorts of pages that are not exposed to the spidering process. One example of these are landing pages that are an entry point to the application with links back into other parts of the application, but that have no outbound links to them.
  • Invisible parameters – I once did some work on an application where every page would respond to a parameter of “d” being passed in with a request by attempting to delete the order with the value of the “d” parameter. (yikes!) This appeared to be some utility functionality that a site developer had placed in the application for debugging and convenience and, thankfully, you would never find any “d” parameters when crawling the application. But the application behavior was there nonetheless. This isn’t the only such time we’ve found application behaviors like this and when we do find them, the impact of exploiting those behaviors is pretty severe.

So seeding application scans with attack surface data gives us the opportunity to jump-start the spidering process and can help get us better scan coverage for applications that expose these sorts of hidden capabilities.

Let’s take a look at how this works in ThreadFix for the OWASP ZAP scanner using the Bodgeit Store example application:

  1. Set up an application in ThreadFix and provide a pointer to the application’s source code (see more about this in the Hybrid Analysis Mapping Configuration wiki page)ham_config
  2. ThreadFix does a lightweight static analysis of the source code to create a database of mappings between attack surface points and the source code responsible for that attack surface (screenshot is of ThreadFix’s command-line Hybrid Analysis Mapping tool to illustrate the underlying analysis)ham_analysis
  3. From OWASP ZAP, you configure the ThreadFix server and API key to be used to pull the attack surface datazap_config
  4. Then you select the application whose attack surface you want to retrieveapp_selection
  5. And provide a relative base URL where the scanning will occurbase_url
  6. OWASP ZAP then pulls the attack surface data from ThreadFix, consisting of URLs and GET/POST parameters that will be used to see ZAP’s spidering and scanning. Note that the ZAP scan now knows about the “admin.jsp” page and multiple “debug” parameters that would not have been found otherwise. This results in a more thorough scan and an identification of vulnerabilities that would have otherwise been missedsurface_import_annotated

This attack surface calculation provides the scanner with an exhaustive list of all the URLs and parameters it will need to fuzz to get a thorough examination of the application. This isn’t foolproof – a scanner won’t be able to fuzz parts of the application it knows about but doesn’t know how to get to. One example of a situation like this is a multi-step process like an e-commerce checkout. However, this sets the stage for running an analysis of the URLs a scanner should have hit, but did not. Down the road we might look to automate some of this checking as well.

These examples show how this technique can be used with ThreadFix and the open source OWASP ZAP dynamic scanner. We also have a plugin for Portswigger BurpSuite and I’ll follow up with a Burp-specific blog post before too much longer.

Calculating application attack surface and feeding it to dynamic scanners is one of the new cool things we’ve done with the 2.0 release of ThreadFix. To download ThreadFix 2.0 and the OWASP ZAP and BurpSuite plugins check out the ThreadFix download center. As always, please feel free to post any questions to the ThreadFix Google Group and post any bugs or feature requests (such as other scanners you’d like to see supported) to the ThreadFix GitHub page.

–Dan

dan _at_ denimgroup.com

@danielcornell

About Dan Cornell

A globally recognized application security expert, Dan Cornell holds over 15 years of experience architecting, developing and securing web-based software systems. As the Chief Technology Officer and a Principal at Denim Group, Ltd., he leads the technology team to help Fortune 500 companies and government organizations integrate security throughout the development process. He is also the original creator of ThreadFix, Denim Group's industry leading application vulnerability management platform.
More Posts by Dan Cornell

7 Responses to “Seeding Web Application Scanners with Attack Surface Data”

  1. Daniel

    Something went wrong with your screenshots there, none of them are readable.

  2. dancornell

    Daniel: Images updated – should be a bit more readable now. Thanks for the feedback.

    Daniel Miessler: Cool stuff. Right now we have language/framework-specific support for Java/JSP and Java/Spring. We have a little bit of Python/Django (internally, not yet released) and we’re looking to add support for ASP.NET (C#), PHP and some others.

  3. Sam

    A similar approach:
    Extract visited URLs from apacheAccess logs. Proxy curl requests through ZAP/BURP. Spider from there.

  4. dancornell

    That would be a good start, but assumes that the traffic to the website has covered all of the landing pages, hidden pages, etc as well as all the available parameters. I think that would be effective in finding landing pages that might not be seen by a crawl, but wouldn’t be as effective at finding debug/admin parameters that aren’t in common use. Certainly a better starting point than a raw, uninformed crawl.

  5. Darrell

    Interesting article – I’ve installed ThreadFix Community and tried out the Zap Proxy plugin. The ThreadFix: Import from Source and ThreadFix: Export Scan work fine, but the ThreadFix: Import Endpoints from ThreadFix results in “Failed to retrieve endpoints from ThreadFix. Check your key and url”. Well the key and url must be right, otherwise the import would not work. In the ThreadFix error log I see messages such as “Calculated JSP root to be: null” and “Found 0 JSP files”. I’m thinking I don’t have the source code info for the application setup correctly and can’t find any details on how it should be. Any advice or url to documentation would be helpful. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *