Sitemaps are helpers for search-engines to discover all relevant pages and content on a website. While there are also sitemaps for images, the focus here is on web-pages only.
How can I generate a sitemap?
A sitemap can be created in various ways. If you are using a framework such as Laravel you can create these on the fly or whenever you publish or update your content.
After some experiments and checking several solutions on GitHub I've not found the solution I was looking for:
A simple, permanent crawler of the actual website.
It considers noindex robots tags as well as canonicals and of course the article:modified_time tag.
Ignores JavaScript as Google does mostly. This allows it to run much faster than executing a headless browser only to access a pure HTML5/CSS3 page.
My solution for sitemaps
As mentioned, after some research I haven't found what I had in mind. So, being a developer at heart, I've opted to build my own solution. It's heavily reliant on PHP Spider, a crawler package for PHP. Besides this, the package is using some regex to identify the most interesting parts of the website. Other values, such as priority are guessed by the depth within the website (nesting level). More detail can also be found on the GitHub repo for Laravel-Sitemaps.
How can I get this?
The package is distributed using composer and can be installed using:
This will automatically configure the required Laravel ServiceProvider.
How to use the package
The package registers an artisan-command called sitemap:generate. This triggers a crawl of your site and writing out of the sitemap. For convenience, you can add this to your deployment steps.
Regular updates of the sitemap
If you'd like to run updates of the sitemap.xml regularly, you can add a new line in app/Console/Kernel.php in the schedule function:
/**
* Define the application's command schedule.
*
* @param \Illuminate\Console\Scheduling\Schedule $schedule
* @return void
*/
protected function schedule(Schedule $schedule)
{
$schedule->command('sitemap:generate')->daily();
// ...or with a defined time...
$schedule->command('sitemap:generate')->daily()->at('02:50');
}
Summary & Questions
If you've got issues please raise an issue on GitHub. To stay updated please subscribe to my newsletter (below). More information can also be found in the BYOI article around the Laravel Sitemap Generator.
Did you like this article?
Besides tones of crap, the web also has lots interesting open-source libraries, actually innovative side-projects and awesome free knowledge. Once in a while, I share these awesome web-findings via email. If this sounds like something you are into, subscribe below: