Mass FTP Crawling
By dsc - August 2015
The combination of interesting files one can find on public FTP servers plus the technical expertise required to make a decent search engine motivated me to write Findex and ultimately this article.
This is an old issue
This article is by no means presented as 'new'. However, given the fact I was still able to collect enormous amount of private files I'd say this deserves some attention.
Scanning
Disclaimer
- Alltough not illegal, mass portscanning is a great way to get kicked off your ISP. I would not recommend doing that from your home connection.
- Indexing/crawling many FTP servers might also not be to the liking of your ISP.
- Traversing public FTP servers is not illegal, however, it is the reader's responsibility to obey all applicable local, state and federal laws.
- I indexed the files, not downloaded them.
- This article is presented as research. Nothing more.
I decided to concentrate on public FTP servers located in my own country, The Netherlands.
I used a list of ip blocks belonging to Dutch internet providers and started my scan. Due to the fact Findex can do distributed scans and crawls it only took half a day.
It resulted in 257.807 discovered FTP servers of which 7578 required no form of authentication. I filtered the list of servers that did not contain any files and got to 2359 public FTP servers. From those I was able to discover 18.088.392 files, a little over 18 million.
I now had indexed every single file stored on a public FTP server located in The Netherlands.
- 257.807 FTP's
- 7578 public FTP's - 2.9%
- 2359 public FTP's containing files - 0.9%
- 18.088.392 files
- 438.994 terabyte
I forgot to look at write permissions, so unfortunately I do not have these statistics for you.
Domains - Top 13
# | Domain | Public FTP Servers |
---|---|---|
1. | ziggo.nl | 397 |
2. | chello.nl | 153 |
3. | direct-adsl.nl | 95 |
4. | alphamegahosting.com | 98 |
5. | xs4all | 90 |
6. | kpn.net | 76 |
7. | planet.nl | 73 |
8. | zeelandnet.nl | 50 |
9. | caiway.nl | 46 |
10. | hetnet.nl | 46 |
11. | telfortglasvezel.nl | 42 |
12. | upc.nl | 41 |
13. | ziggozakelijk.nl | 32 |
All entries in the list are dutch ISPs except for the 4th place which seems to be a hosting company. Servers there probably come with a default public FTP account.
Location
I did a lookup on all the IP addresses and figured out their physical locations.
The province of 'Drenthe' has the lowest amount of public FTP servers. This is probably due to a low population density. 'Noord-Holland' has the highest amount, which also reflects the province's population density.
File categories
The following table shows the distribution of file categories.
SELECT file_format,sum(file_size),count(*) FROM files WHERE file_isdir != TRUE GROUP BY file_format;
Category | Files | Percentage | Size |
---|---|---|---|
Documents | 1997107 |
|
1.7 TB |
Movies | 282306 |
|
75.3 TB |
Music | 1046560 |
|
8.6 TB |
Pictures | 5175332 |
|
8.8 TB |
Unidentified Files | 9587087 |
|
344 TB |
Surprisingly, 28% of all the files collected were pictures (5 million!).
I searched a bit through the data and concluded that most pictures were photographs.

Most of the photos I found were part of a collection. This means that people use their public FTP server as a backup device for their personal photographs.
File Extensions - Top 10
The following table shows the 10 most popular file extensions.
SELECT file_ext, count(*) FROM files WHERE file_isdir != TRUE GROUP BY file_ext ORDER BY count(*) DESC LIMIT 10;
# | Extension | Files |
---|---|---|
1. | .jpg | 4.114.712 |
2. | .deb | 2.039.029 |
3. | .mp3 | 869.530 |
4. | 720.040 | |
5. | .png | 577.334 |
6. | .rpm | 550.756 |
7. | .gz | 466.525 |
8. | .html | 336.627 |
9. | .txt | 250.380 |
10. | .dsc | 195.674 |
Sensitive Files
And now for the more juicy stuff... Sensitive files can be found by searching for them.
SELECT count(*) from FILES WHERE file_isdir != True AND file_format=1 AND searchable like 'keyword%';
Keyword | Files | Description |
---|---|---|
'wachtwoord' and 'password' | 396 | 'wachtwoord' means 'password' in Dutch. Text documents came up with lists of passwords |
passport | 192 | Images and documents of passports |
belastingaangifte | 517 | 'belastingaangifte' means 'tax return' in Dutch. Tax documents came up. |
'factuur' and 'invoice' | 4544 | 'factuur' means 'invoice' in Dutch. A lot of invoices came up. |
creditcard | 139 | Photos and documents of creditcards |
gemeente | 614 | 'gemeente' means 'local authority' in Dutch. Goverment related documens came up. |
wp-config.php | 32 | Configuration file for Wordpress |
configuration.php | 61 | Configuration file for Joomla |
config.php | 428 | Configuration files for various other web applications |
passwd | 82 | Information file about users on unix systems |
I viewed a few of the files and they were indeed what the filenames depicted.
The most sensitive files I found were documents belonging to a certain court, which described in detail information about court hearings and cases and personal information about the people involved (judges, defendants, lawyers, etc).
There was also a lot of documents belonging to a company that does 'property valuation'. There were floor plans, prices and other stuff of universities, police stations and big companies.
Responsible Disclosure
I will not publish any of these documents or pictures. But I will also not notify the affected parties in question for the following 2 reasons:
- Retaliation
- Too many hosts
I've already been kicked off my ISP once for responsibly alerting someone on a vulnerability and I can tell you it is not fun. Also, I'd say more than half of all the public FTP servers I was able to gather were public by accident and exposing sensitive files. This would mean I'd have to warn 2500+ people or companies. ;D
Conclusion
Many public FTP servers on the internet are still hosting sensitive files, in the year 2015. I had the ability to download a wide variety of sensitive documents and most surely other people are doing this too.