26 October 2010

Benign Unexpected URLs - Part 1 - Missing (404 Not Found Error) Files

Application information gathering such as enumeration of directories, files and other resources are a type of forced browsing and which may be made easier by using predictable resource locations.

Some requested URLs are likely to be an attacker exploring the application or trying to find a published vulnerability:

  • /.htaccess
  • /index.php?file=../../../../../../../../../etc/passwd
  • /plugins/editors/tinymce/jscripts/tiny_mce/plugins/tiny...
  • /statistik/usage_201007.html
  • /sumthin
  • /FormMail.pl

If these do not exist, they may raise a 404 status error, and be listed in web and application error logs.

As part of my presentation at AppSec Washington DC in two weeks time, and the related guidance document, I wanted to provide examples of URLs which publicly exposed web applications may receive for non-existent resources (and thus generate a 404 status error), but which are not necessarily malicious and probably not even suspicious. This is important because one of the primary benefits of application level intrusion detection and prevention is its very low false positive rate (falsely identified attacks).

If URLs are requested which do not form part of the allowable entry points to the application, what does that mean? An attacker might be undertaking reconnaissance, testing defences or looking for vulnerabilities—just the types of things an intrusion detection system should be looking for. However, they may be benign requests.

In the tables below, I have set out 20 common types of non-malicious unexpected missing file URLs which most public web sites could receive. If the site is not exposed to search engines and external parties, this list will need to be shortened.

Type A: URLs that Should Not Exist

The first type is URLs that are requested with the premise the page/file does not exist.

Category Comments and examples
404 Checks Web crawlers (spiders, robots) and site monitoring tools may send requests for URLs which do not exist to check the response includes a 404 status code.


Type B: Assumed Valid URLs

In this type, URLs are wrongly assumed to exist but are requested nevertheless.

Category Comments and examples
Old URLs Search engines may have indexed URLs which have been moved or no longer exist, or these may be referenced in indexes, user's bookmarks, on other websites and in office documents. These would also include user-generated content such as profiles, images etc that has subsequently been deleted. Temporary URLs created by the application (e.g. for time-limited report generation) and URLs provided to third parties that have expired also need to be considered.

New URLs New application entry points may be linked to before the resource exists during change management processes, or by mistake.

Unacceptable URLs Some applications and websites may exist in multiple contexts. For example, there may be an internal version of a corporate website with additional URLs (e.g. staff directory) which are not present on the external public site. This may be harder for internal users to identify if DNS is configured so that internal users see the internal site on the real domain. Links may be used from outside or sent to third parties.

URL Rewriting URL rewriting is often used to present human-readable addresses. These may include directory names (e.g. years, months and days for blog entries) which generate a not found error when requested independently. E.g. for /2010/10/26/Benign-Unexpected-URLs-Part-1-Missing-Files


Code Libraries Relative URLs in third party code such as style sheets and JavaScript libraries may reference local content such as images and fonts. If these have not been copied as well, missing file errors will occur.

Device Specific Some browsing devices may try to find an alternative version of a site (e.g. optimised for a mobile device) by making requests that will lead to not found errors.


Ownership Verification Files added to the site (often in the root) to verify site ownership (e.g. advertising services, webmaster tools, uptime and anti-malware monitoring) are re-requested by the originating site but have been removed.

Policy Files Policy files may be requested even if they do not exist.


Robots Exclusion The robots exclusion file will be requested by search engine crawlers even if it does not exist.


Site Maps Sitemap files that do not exist may be requested automatically by web crawlers and scanners.


Favicons Favicon images may be requested automatically by web browsers and used in the address bar and to help visually identify bookmarks; multiple file names and extensions may be requested until a valid file is found.


News Feeds and Trackbacks News readers and aggregators may try to (unsuccessfully) guess RSS and atom feed URLs.


Other Associated Files Web browsers and their plug-ins may request files associated with the content that do not exist. E.g. for /example.pdf


Toolbars Toolbars such as Discuss for Internet Explorer/Microsoft Office may request URLs that do not exist.


Malformed URLs Poorly built web crawlers may request URLs that don't exist because of incorrect parsing of links.


Previous Site If a domain or IP address has been used for a completely different web site or application previously, there may be missing file errors reported for URLs that did exist on the previous site.

Type C: Incorrect URLs

Some URLs are not, and never have been, valid entry points but are requested normally due to a user's mistake of some sort such as a transcription error.

Category Comments and examples
Truncation URLs can become truncated in emails due to line wrapping or may not be correctly referenced in other files such as PDFs. Alternatively, the URLs may be complete but have additional whitespace characters (e.g. carriage return, line feed, tab, space) within them.


Extra Punctuation URLs may be requested with additional punctuation suffix, perhaps due to an address that has been hyperlinked incorrectly in a sentence.


Mis-Spelling URLs may be published/copied/typed incorrectly.


Case Sensitivity URLs may be typed/requested in a case inconsistent with the server.


Type D: Testing

Your own functionality and security testing, or testing services performed on your behalf, will request URLs that do not exist. These might also be considered benign if the scanning is authorised and from a known source.

Category Comments and examples
Testing Scanners and manual testing may request numerous URLs to enumerate applications, to examine response messages & codes, to test access control and to identify unknown files and directories including those containing old and backup files.

Some of these could be being requested by an attacker, but they are usually not enough to identify one—Type D from an unknown source, or unauthorised, should not be assumed to be benign. Items such as incorrect case, truncation and mis-spelling (Type C) cannot be predicted in advance, but if they occur it may be necessary to add custom redirections to the correct URLs.

If you have further suggestions and examples, please share them. Tomorrow, I will extend the discussion of benign URLs to valid (non-missing) URLs.

Update 8th November 2010: Following a direct message, I have added /trackback and URLs referenced by client-side style and code libraries. Link to 'Tomorrow' enabled in last paragraph.

Posted on: 26 October 2010 at 07:06 hrs

Comments Comments (0) | Permalink | Send Send | Post to Twitter


Comments are filtered automatically and should appear shortly after they been checked.

Post a comment
Confirm acceptance and understanding of the terms of use
New posts to this thread will be sent to your email address
Benign Unexpected URLs - Part 1 - Missing (404 Not Found Error) Files
ISO/IEC 18004:2006 QR code for https://clerkendweller.uk

Page https://www.clerkendweller.uk/2010/10/26/Benign-Unexpected-URLs-Part-1-Missing-Files
Requested by on Thursday, 26 November 2015 at 08:39 hrs (London date/time)

Please read our terms of use and obtain professional advice before undertaking any actions based on the opinions, suggestions and generic guidance presented here. Your organisation's situation will be unique and all practices and controls need to be assessed with consideration of your own business context.

Terms of use https://www.clerkendweller.uk/page/terms
Privacy statement https://www.clerkendweller.uk/page/privacy
© 2010-2015 clerkendweller.uk