PHP offloading: Clone a website with wget

  • SumoMe

Want to clone a website?

With great power come great responsibility and this time is no even a funny joke. Simplicity is the best way to fix any problem and going back to our roots never hurts. Remember when all sites used to be static but it became such a pain to do anything fun? The solution was to migrate to server-side technologies that in turn became a challenge to optimize and efficiently serve content to the masses. So the tradeoff was more features and maintainability for speed and robustness.

If you ever wanted to backup (read make it static) a site this is probably one of the easiest and fastest ways to do so would be something like this:

wget -q http://www.example.com/ -O - | egrep -o "(mailto|ftp|http(s)?://){1}[^'\"]+" | wget -r --tries=10 -q -i - --wait 1

Let me break it down to you: wget goes and finds a web page to download, in this case www.example.com, which is then piped to egrep which in turn finds all links in that page and pipes them back to wget which now downloads that link egrep just found… pretty cool right?

What use do I have for this

Plenty I should say… two come to mind.

1. Backup your sites structure and if hell breaks loose you don’t have to worry that MUCH about restoring your information because now you have a bunch of static files that can be served a lot faster and use your resources better and it can buy you some time to know what happened.

2. Improve your site speed by removing the stress of the web server and database by serving static content of your old pages… Simple right?

I mean seriously how often do you update that old content? Now Imagine that you send this content to your CDN instead of your assets? How cool that would be?

Can I use this with WordPress or Magento?

You can use that trick with every platform and technology that serves websites known to mankind. Be aware though egrep doesn’t discriminate links so you might want to make sure it only gets links from your website.

The idea behind this little trick is very simple: combine the knowledge of your application with the right set of tools and decide if your content needs to be static or dynamic and how can you serve that content even faster and more efficient. So now that we could even using some caching as long as we remember that it is not the only thing we can do to optimize our sites.

Final thoughts

Before you go and implement this in any production site, remember all the rules of cache invalidation, obviously this is a compromise between the easiness and maintainability of a dynamic site vs the throughput of the server and it’s aimed primarily to help blog platforms. In other words, it is not easy to keep up static pages but it is a lot faster than dynamic pages. So formulate a plan on how you are going to “refresh” that content if it is ever updated. If you are using WordPress you may want to look into the wp super cache plugin instead as it follows the same idea but it is designed specifically for that cms platform. For obvious reasons you can do this to forms but NOT to the action of it.

So there… we have successfully offloaded some of the operations from PHP and improve the speed of the site with a simple trick.

VN:F [1.9.22_1171]
Rating: 9.5/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)
PHP offloading: Clone a website with wget, 9.5 out of 10 based on 2 ratings

Author: Luis Tineo

Husband, Father, performance improvement junkie, biker and video gamer, Linux user and in my day job I'm a Systems Architect at Blue Acorn.

Share This Post On
  • unlock2phone store

    I tried it but it didn’t worked

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  • Rameez

    its working

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)