This article applies to:
- How does Proxy Caching work in WebMarshal?
Most HTTP proxies work in the same basic manner. The RFC document for HTTP (RFC-2616) includes a set of rules that indicate which content can and cannot be cached. The RFC also sets out a group of HTTP headers that allow web servers to indicate the cacheability of files and how long each file may be cached.
The WebMarshal proxy cache provides a caching implementation within the bounds of the RFC.
WebMarshal does not currently cache FTP or HTTPS traffic. FTP traffic is not covered by the HTTP RFC, and the RFC for FTP does not specify any mechanisms for defining caching information. HTTPS traffic is specifically excluded because it is supposed to be secure and there is a risk of information disclosure if pages are cached incorrectly.
When a request comes through for an HTTP resource, WebMarshal checks the cache directory on disk to see if there is a cached response for the requested URL. If a response has been cached, WebMarshal then checks to see if the response is "fresh" enough to be returned to the client. If the response is considered "fresh", WebMarshal will pass the cached content to the engine for policy processing. If the response is "stale" or no response has been cached, WebMarshal will contact the web server and request the item again. If the web server indicates that WebMarshal's cached copy is still valid, WebMarshal will use the cached data. Otherwise the response from the web server will be downloaded.
The proxy cache does not affect how the WebMarshal policy is applied. When a response is retrieved from the cache, it is still written as a file to the proxy's temp directory then sent to the engine for processing. There is no caching of the rule processing result, because changes to rules or updates to virus scanner data files would immediately invalidate any cached result.
WebMarshal's cache implementation also has some features that can make life easier for system administrators.
While most web sites operate within the bounds of the RFC, there are always some that are incorrectly configured or coded. This can lead to situations where content is cached when it shouldn't be. WebMarshal allows sites to be excluded from caching altogether in order to work around such problems. (You can configure this list in the WebMarshal Console, Properties > Proxy Cache.)
When the allocated cache space is nearly full, WebMarshal automatically manages the content. The least recently used items are removed to make room for new content. Note that the original download date of the file is not relevant to pruning. Older files will be retained in the cache if they are requested often and have not changed.
Cache Directory Organization
The cache directory is organized in a logical manner that makes it easy to get to files for a given domain. Directories are organised by top-level domain, then the first 3 characters of the domain name, then the full domain name. For example, all pages for www.m86security.com are stored within com\m86\www.m86security.com. If cache information for a domain is ever corrupted, the system administrator simply needs to locate the correct domain and delete it. The cache will carry on as if those files never existed.
- The administrator can also verify the cache contents and perform other advanced tasks using the Cache Tool. See Trustwave Knowledgebase article Q12724.
Cache URL Rewriting
YouTube presents a problem for caching proxies. Even though its video files are marked as cacheable for an hour, the URL changes every time the video is played. Because the URL changes, a proxy cannot know if the content will be the same, so it has to download the video again. The WebMarshal cache contains URL rewriting rules for YouTube videos that strip out the changing part of the URL. This allows WebMarshal to serve the same video to multiple users without having to contact the server again.
For some best practice technical recommendations about caching, see Trustwave Knowledgebase article Q12720.