*

Scam Filtering with a Proxy

The Idea
First, you have to have a proxy capable of whatever the load will be. For this discussion, I will assume a building or school, but the idea could easily be scaled to the size of an ISP network using multiple proxies for different areas. Anyway, the idea is that the proxy will check each request, and see if the domain name is registered as a phishing site. Next it will do several other checks, including a referse DNS lookup if it is an IP address, check the country of origin, etc, and develope a score much like most spam software does.

If the url is a registered phishing site, or anything else prohibited, the proxy will obviously block the page. Nothing new there. The new part comes into play with the phishing score and AI built in, that will:
    1. Remove all javascript and flash from the page.
    2. Display an html element at the top of the page that explains the page might be fake, but that has a button to close the page if the person does not care.
In this way, pages found to be scam sites due to high scores (for instance, a page from China linking to images from paypal.com) will be rendered static by removing the javascript, and a warning placed right on the page. The removal of javascript is important as javascript can remove warnings or rewrite the entire page onload. The way it calculates the score is what is really important.

Calculating the Score
First, Check known databases of urls (cache results too) and a match gets a score of 100/100. Otherwise, check the local list of domain names and scores to see if the score is known and up to date. If it is unknown or too old, recalculate. Note that the score exists for the domain name, even though it is calculated by the exact url. This is because most hacked servers hold multiple phishing sites most of the time, so until the score is recalculated, the entire domain should be blocked.

If a score is to be calculated, start with the country of origin. Although the country does effect the score directly, it will matter to other steps. Next, check the links on the page. If the majority of links go to another url, check if the url if probably a bank. Also check if the site linked to has a good majority of the exact same html. If so, the site would seem to be copying, and thus is probably a scam site. Next, check any forms. Any text near a form element such as "PIN" "credit card" "Expiration Date" means it could be a scam. Next, check the images or other files linked to by the page. If they come from another server, especially one that is found to be a bank, it could be a scam. Next, check the url. If the url consists of a directory that begins with a period (meaning it is a hidden directory) it is probably a scam. If the url uses the IP address instead of a domain name, it is probably a scam.

Many other checks can be written, but these are the major ones in my eyes. After that, you can calculate as score, and clean the html if needed. Although I'm not saying exactly how to calculate the score, I will hopefully be doing that later on since this idea is in an early phase.



reece
home
history
baby
photos
calendar
addresses
wall
projects
4006
word
flickr
monitor
chat
lolmail
work
cocard
ibm
resume
dev
sudoku
security
portsentry
portknock
badbot
setuid
web
greasemonkey
visitors
links
downloads
misc
art
vote
influence
waffles