Using the Google Safe Browsing API to check shorter urls.

I’ve been wondering for a while how many of shortner’s short urls would lead to a site containing malware.

Luckely the Google is here lending a helping hand when checking for bad urls.

In short/simple terms the google safe browsing api lets you download a set of hashes of urls that contain nasties and updates to keep that list you have locally up to date.
After that you can create a list of possible urls to check against.. create md5 hashes of those.. and then check those against that list you have. If you get a hit.. the big chance the url contains some form of malware or another. Now I don’t want my users to end up on a site with malware so I can stop the redirect and give people the choice to continue or not.

Anyway. More information on this awesome thing you can find over here: 

Oh and thanks Google for putting this awesome thing out there for developers to use to protect their users.

5 thoughts on “Using the Google Safe Browsing API to check shorter urls.”

  1. First of all Thanks for the input..

    Although it is interesting… The example code zip files for php are ..well … empty.

    The Google safe browsing API cuts down on latency by storing the entire DB locally… Which means no slow round trips to the server of BrightCloud.

    Besides that I have no idea what kind of information I’ll be getting. And what kind of fine mesh there is. There is some example return information but its vague. There is no list of categories visible.

    I think the “abortion pro choice” category and indexing it in “Legal Liability” is , lets say, a wrong move.. (especially if you use such a topic for an example)… why not use “porn” as an example ?

    All I want to do is :
    Make sure my visitors are safe from malware and viri, thats it. I think the Google safe browsing api fits better with that.

    Oh and:
    “You cannot access our data if youre a competitor, or work with a competitor.”

    might become a problem ;)

  2. Hey –

    Which link did you try to access for the PHP code? Was it the REST code on this page?

    It works for me now (always has) — please let me know if it’s still empty for you!

    We definitely need to do a better job on the documentation, I’ll own that for sure. :)

    Based on your comments I’m assuming you read this?

    What you were reading is the first few lines returned from a call to retrieve the entire list of categories we have. Still, you’re right we could use a less contentious category to demonstrate with… :)

    Please know that both “Abortion” categories (e.g. pro choice and pro life) are just under the more general category group of “Legal Liability” because some organizations have legal issues with these categories (schools for example). There is no “value judgement” we’re making with this. Other category groups are around productivity (e.g. sports sites, youtube) and security (malware, phishing, spam, etc.).

    I think you’re interested in only the security categories. We have a lot of users who are only interested in those.

    If you skip down to the section in the tutorial which says “Retrieve categories for a URL.” The results for a lookup are an XML message which return essentially this:


    Thus, this URL was in category ID 50, we also let you know that we think it’s in that category with a “confidence” of 85 (scale is 1-100).

    You can do a getcatlist call to get the entire list of 80+ categories and their associated IDs, then store them in a hash on your side. This helps keep each message shorter.

    You mentioned that all you want to do is keep people safe from malware, etc., so ignore whether we think the URL is in Abortion. :) By doing a lookup against our system you should just check to see if we think it’s in the security categories (phishing, malware, spam, etc.). If it is, take action to protect your users. If it’s not, then let them through.

    Our list is MUCH more comprehensive than the GSBAPI, which means that it’s less economical for us or you to download the entire thing. Instead you do a call per item you’re checking. However, we encourage you to do caching to reduce the number of calls you have to make to us within a reasonable period of time. You don’t want to cache for too long otherwise your cached data will be stale.

    The full legalese is more specific, and I encourage you to read it to be sure, but as long as you’re not trying to create a web filtering database you should be fine. :)

  3. My Self Sanjay Kumar i am operating moneyinhands .com Recently i found IFrame Element on website. This Element automatically add on my website. i don’t know about this and in future please tell me details about how i can protect my website moneyinhands .com against any element or malware. Please also suggest me any software or script who can help me to protect my web pages against element and malware. I would also interested to know about how i can re appears my website on safely.

  4. Sanjay,

    If I understand you correctly: You recently found an iframe on your site, and got flagged for malware , and You’re looking for some way to protect against that ?

    Basically there are a few things (pretty basic really) you can do to protect yourself as best as possible against malware invasions:
    – Keep your software up to date
    – If your site is your companies face and what makes it money, Don’t host it on a shared hosting service.
    – Compromised sites from another user possibly could compromise your site.
    – Keep your software up to date , Can’t stress this one too much.
    – If you write your own software , on your own server , and keep getting hacked /malware placed while your software is up to date check 2 things: 1. Is there a new user on your server that should not be there , 2. Check if your software has any exploits.

    As for getting off the blacklist:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.