Models and classes
django-google-dork is built around few models and classes.
On one hand Django models are used to manage your search campaigns and
store their results, on the other hand a couple of ad-hoc class are
used to perform the actual work in an asynchronous way.
Class diagram
Django Models <- | -> Regular classes
+--------+ * +----------+
|Campaign|------------, | Scrapper |
+--------+ ' +----------+
' ' '
'* ' '
+--------+ * ' ' +----------+
| Dork |-------, ' `---|Downloader|
+--------+ 0..1' '0..1 ' +----------+
' +----------+ '
'* ,-| Engine | '
+--------+ * ' +----------+ ' +----------+
| Run |---' `---| Parser |
| |--- +----------+
+--------+ * ' +----------+
'* `-|Downloader|
'* +----------+
+--------+
| Result |
+--------+
Class outline
- Django models:
Campaign: A logical container forDorksDork: A search expression or queryEngine: A google engine, eg: google.com, google.fr, google.co.nz, ...Run: A search instance of aDorkagainst a googleEngine, yieldingResultsResult: A search result, composed of atitle, aurland asummaryDownloader: Downloads a given URL. FIXME
- Helper classes:
Scrapper: Downloads pages, parse them and recurse if neededDownloader: Downloads a given url. FIXMEParser: Parse a page and returns search results and page links
Asynchronous tasks
Under the hood django-google-dork uses django-celery to perform the background job, all you need to do is configure a periodic job to call the main task: django_google_dork.dork_tem_all.
For each Dork of each Campaign, dork_them_all will download the initial result page, and then in a recursive way, it will create additional asynchronous task to scrap all the other pages up to the last one.
TODO?
- Add a number of page limit to
Campaign
Diversity
Dork
A Campaign allows you to use more than a Dork to find what you are
after, might it be because you're looking for something that requires
internationalised dork, for example "Powered by Wordpress" and
"Fièrement propulsé par WordPress" will gives you both English and
French websites based on WordPress.
Engines
Using google.fr to search for French stuff lead to significant
better results, thus the Engine model allows you to optionally
associate a specific Engine to a Campaign or a Dork.
At the same time using google.com.br to search for Russian stuff
might brings you unexpected but yet interesting results. This is why
if no Engine is associated with a Campaign or a Dork, then an
engine will be selected randomly.
If the list of Engine is empty, a default one will be created using
the hostname google.com.
Downloaders
To avoid get banned by google search engines, you need to use
different IP addresses, that's what the Downloader is for. It's up
to you to implement it, you could for example use the python
requests module to proxy the queries to a pool of external machine.
The whole Dork run will be done using the same Downloader and the same
google Engine.
Example
Let say that you want to find r57 php shells around, so using the Django Admin Panel:
- You create a campaign that you name r57 PHP Sh3lls
- You create a dork with the query intitle:r57shell +uname +rwx and you assoicate it with the previous
Campaign - You create a new periodic task (let say that run every 6 hours), and you select the
django_google_dork.dork_temtask
You can nowmonitor your Celery queue and as the tasks get executed you can start looking at the list of results.