How to use configure Squid proxy to access both Internet and internal websites.

Jephe Wu -  http://linuxtechres.blogspot.com


Environment: company (domain.com) LAN (10.0.0.1/24) is connected to Internet through lease line, the proxy server, which is also firewall, running Squid 2.6 for LAN users to access Internet. Another proxy(10.0.0.2/24)on the LAN, which is also firewall, connected to company headquarter office through lease line for users to access some internal websites.

Objective: Users only use 10.0.0.1 as Internet proxy to access both external and internal websites, for internal websites, the 10.0.0.1 will use 10.0.0.2 as parent proxy to access it.

company external websites: *.domain.com except for jephe1.domain.com and jephe2.domain.com
company internal websites: *.internal.domain.com, jephe1.domain.com and jephe2.domain.com


Steps:
1. configure Squid on 10.0.0.1 as follows:

http_port  10.0.0.1:8080

cache_peer 10.0.0.2 parent 8080 3130 no-query
cache_peer_domain   10.0.0.2   .internal.domain.com jephe1.domain.com jephe2.domain.com
acl internal  dstdomain .internal.domain.com jephe1.domain.com jephe2.domain.com
never_direct allow internal 

Now you can only use proxy 10.0.0.1:8080 to access both Internet and all internal websites. 

Note:
1.  According to http://quark.humbug.org.au/publications/squid/introsquid.html,
One important concept that must be understood is that of parents and siblings. A sibling is a cache that your proxy, when it receives a request for a URL, sends a query to to see if it has a copy of it. The sibling then sends back either ``Yes, I have it'' or ``No, I don't have it''. The proxy then decides if it should retrieve this object from a sibling, or go get it from the source directly. A parent is a proxy that, if none of the siblings have a copy of the object you want, your proxy opens a request to and asks the parent to go get a copy for it, rather than fetching it directly.

2. never_direct directive means it will go through parent proxy for both http and https request for those internal domains. Otherwise, if the http request for internal websites are redirected to https request, 10.0.0.1 will try to fetch those https request directly from itself without going through parent again. If configured 'never_direct', then it will go through parent proxy for https request after redirection as well.

3. no-query is a ICP options to disable ICP queries to this cache.