Bill brings up some interesting points in the comment on this thread about the search engines blocking the pages that give the rank and/or links out from the XSS exploitable pages.
In some cases, I think this is possible. For instance, there's one XSS exploit that I used for almost a year that gave me backlinks to my sites from search boxes on other sites. I'm sure it's still useful in quantity but I have noticed that some of the search engines don't like those particular links anymore. It might be because they're blocking them, they don't like the lengthy URLs or because there's not much content on those pages.
Regardless of the reason, however, there are still a ton of wiki/tiki/blog/whatever with XSS exploits out there. Yes, yes, I know it's all the rage to talk about XSS exploits right now. Bear with me.
One solution that people have come up with is for programmers to code better and to not have the holes in the first place.
I've been a programmer for over 15 years now and, unfortunately, in the entire time I've been coding, security issues are usually the LAST thing that people think about. It doesn't affect the functionality of an application and thus is a money sink.
So, we can hope and pray that programmers of the thousands of applications out there that have XSS holes will write, test, debug their code so that it doesn't have XSS exploits in them or we can hope for a more global solution like the Search Engines taking action in some way.
One way, as Bill suggests, would be for those pages to get banned. And the search engines have already done this if you link out to "bad" neighborhoods. This will have some effect but there are plenty of XSS exploits that are undetectable to the search engines that the spammers will use. They've got tools that aren't too difficult to write that will go out there and find exploitable sites for them to spam later.
Another hope is that people will upgrade their software to fix the XSS exploits that have been fixed in a new version. Yeah, it'll work for some people...but I can tell ya know, there will *always* be some that don't/won't/can't upgrade - for whatever reason. And as long as that is the case, there will always be some XSS exploits around.
To put it bluntly, I really don't see a good solution to any of this as of yet. There's lots of finger pointing going on while the spammers happily continue to use the techniques talked about or even the new ones that are found every day...
G-Man
I dunno I'm no master programmer but how hard is it to write a cleanme($string) function for all user input? Strip out anything that's not a character you'd use in a physical address, phone number, or email and you should be golden. Maybe I'm a lazy spammer but write stuff like that once throw it in my function library and you have it foreva'
Not hard at all with regular expressions :P
G-Man
Wanna bet $50 that this thread gets slashdotted?
Ironic isn't it... in many cases, the engines themselves are a great tool for spammers (and worse) to seek out and find sites to exploit.
Ironic, yes. Surprising, no. Google is a software tool for searching for information from web pages. It's built to find this kind of information just as much as it is to find information about Viagra or Britney Spears. To the Google search engine it's all just web content information. The nature of search in all forms is that there are obvious and not so obvious ways to use it. The less obvious ways to use it tend to be the more "powerful" and interesting in their results. Google is also a great engine for finding illegal music, movies, software, porn, etc... Google is more apt to block sites with illiegal content before they will block sites with XSS vulnerabilities.
Yep. Told ya the scripts were pretty easy to write to find this stuff...
G-Man
Speaking of XSS, I found this article on slashdot claiming the following percentages of sites suffering from web application vulnerabilities:
1. Cross Site Scripting (21.5%) 2. SQL Injection (14%) 3. PHP includes (9.5%) 4. Buffer overflows (7.9%)
He provided a clever technique for detecting vulnerable pages. He used the Google API to look for sites that have ?id=x (where x is some integer) in the URL. He substituted in a bit of SQL in place of x, fetched the page, and looked for SQL errors. If errors were found he declared the page vulernable to SQL injection. You could easily insert encoded <a> tags in place of X and then scrape the fetched page for a rendered link. One could theoretically run millions of these queries and build a massive network of XSS driven links.
I agree that overall there is no solution that's going to fix all the XSS issues that exist today. Web development is easy to learn and as a result you've got a ton of insecure sites cropping up all over the place. If every web developer had to become a seasoned security engineer before they could build a webpage their probably wouldn't be much internet to surf. Webmasters should be educated about the issue, but ultimately I think the search engines are the ones who should be fighting the real fight against XSS vulnerabilities. It seems to me that the majority of XSS exploits have two things in common:
1.The page they're exploiting isn't linked to from the domain it's hosted on. It's a "floater."
2. Not always, but often times the url contains encoded <a>
Combining these two seems like a fairly effective way of pinpointing exploited pages, but then again I'm not a search engineer.
Point 1...that used to be the case but isn't anymore. The XSS exploits have gotten much more advanced lately using forums and other sites that normally have user input. If you look for the ones that aren't maintained your xss injections are likely to go unnoticed.
Point 2...Now, this part I can see catching some pages HOWEVER, just because something is encoded does *not* make it evil :)
G-Man
No, but having an encoded <a> tag DOES make it evil :) I can't think of any non-malicious reason for having it in a URL.
LOL - I honestly don't know why anyone would want to encode anything but there are so many ways to encode stuff that the designers of these specs must have had some nifty idea in mind at the time.
That'd be an interesting test for Google to run and get back with us and see if it works or not. It's a shame we can't give them suggestions, have them run it and see if it works or not. They just take our ideas and then we never know if it is a good idea or not :P
G-Man
Those problems with forums and other web apps have to do with the bad of "the good and the bad" when you're dealing with open source apps. At a previous job, we had the Horde PHP framework running on one of our servers. A security vulnerability was "discovered" (really, some programmer just left a php eval() in their code, I think as a backdoor for himself) and it let someone install an ftp server, move over an IRC bot, and our server started serving files -- XDCC style.
The moral of the story is this: as long as there are programmers and programs, there will be programmers and programs trying to break them. We're never totally going to stop XSS, injection, brute force attacks. There exists today tools and methods to stop and prevent those types of problems. But like Matt said, web dev is so easy to get started that there are more sites vulnerable than there should be.
The problem as I see it, is SEs hold .edu and .gov links in such high regard, but in many ways they are the most vulnerable to these types of attacks. .edu and .gov sites are full of old cgi scripts that are about as secure, now-a-days, as a fisher price toy safe. They were created in a time when hackers and their hacks were less sofisticated than they are today. And because of this high value, hackers will focus lots of their attention to breaking them -- same reason Microsoft, which is CLOSED source has lots of problems: it's popular software, and hacking it will cause maximum damage. (This is why nobody hacks SGI boxes) And if the SEs can find a way to weigh .edu and .gov links without making the vulnerable sites heavyweights in their value to a search enine, that would stop most of the problem.
That is, until the hackers find another weak link in the chain...