Sitemap RSS Howto
This page discusses the Google Sitemap feature and how it can be used. I have example code for setting up a sitemap for TWiki as well as Gallery1.
One interesting thing that I discovered while working on a customer site was the Google Sitemap feature. This allows you to publish the list of URL's currently available on your site to Google directly. You can read about the format of this https://www.google.com/webmasters/tools/docs/en/protocol.html
Also, you can add your sitemap information to the first line of your robots.txt file as follows:
Sitemap: http://wiki.pachogrande.com/sitemap.xml
You can tweak things about your site, such as:
frequency of updates on a particular page of the site
relative priority of a particular page on your site
last modification date of a particular page on your site
In all honesty, 99% of the time, all you're wanting to create will be a list of URL's that are available on your site. This is handy when there aren't internal links between all pages, for example when you're using a CMS, Wiki or similar.
MSN/Yahoo both support the sitemap feed as well.
The simplest sitemap you can create would be similar to this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://sitename.com/dev/3/Inventory+page</loc>
<lastmod>2007-02-09</lastmod>
</url>
<url>
<loc>http://sitename.com/dev/1/Home</loc>
<lastmod>2007-03-30</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Currently I'm using the following to provide my sitemap.xml at http://wiki.pachogrande.com/sitemap.xml:
<?php
if($dir = opendir("/usr/local/PachograndeWiki/data/Public")) {
header("Content-Type: text/xml");
print('<?xml version="1.0" encoding="UTF-8"?>' . "\n");
print('<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"' . "\n");
print(' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9' . "\n");
print(' http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"' . "\n");
print(' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n");
print('' . "\n");
while(false !== ($file = readdir($dir))) {
if(preg_match('/^[.]/', $file)) { continue; }
if(preg_match('/,v$/', $file)) { continue; }
if(preg_match('/^Web/', $file) && $file != 'WebHome.txt') { continue; }
$filenoext = preg_replace('/[.]txt$/', '', $file);
$filefull = '/usr/local/PachograndeWiki/data/Public/' . $file;
$filemod = date("c", filemtime($filefull));
$url = 'http://wiki.pachogrande.com/twiki/bin/view/Public/' . $filenoext;
printf("<url>\n\t<loc>%s</loc>\n\t<lastmod>%s</lastmod>\n</url>\n", $url, $filemod);
}
print('</urlset>');
}
?>
I've got another piece of code that handles my Gallery1 installation:
<?php
require("/usr/share/gallery/classes/Album.php");
require("/usr/share/gallery/classes/AlbumItem.php");
require("/usr/share/gallery/classes/Image.php");
function iso8601($timestamp) {
return sprintf('%s%s%s%s',
date('Y-m-d', $timestamp),
'T',
date('H:i:s', $timestamp),
'-08:00'
);
}
header("Content-Type: text/xml");
print('<?xml version="1.0" encoding="UTF-8"?>' . "\n");
print('<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"' . "\n");
print(' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9' . "\n");
print(' http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"' . "\n");
print(' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n");
print('' . "\n");
/* Config starts here */
$albumbase = "/usr/share/gallery/albums";
$baseurl = "http://gallery.pachogrande.com";
/* Config end */
$albumfile = $albumbase . "/albumdb.dat";
if(! is_readable($albumfile)) {
print("Fatal error: Cannot read albumfile $albumfile");
die();
}
$albumdata = unserialize(file_get_contents($albumfile));
foreach($albumdata as $albumname) {
$albuminfofile = $albumbase . '/' . $albumname . '/' . "album.dat";
if( ! is_readable($albuminfofile)) {
continue;
}
$albuminfodata = unserialize(file_get_contents($albuminfofile));
// album URL is http://gallery.pachogrande.com/$albumname
// album modified as unix epoch is $albuminfodata[fields][last_mod_time]
$url = $baseurl . '/' . $albumname;
$filemod = iso8601($albuminfodata->fields["last_mod_time"]);
printf("<url>\n\t<loc>%s</loc>\n\t<lastmod>%s</lastmod>\n</url>\n", $url, $filemod);
$picfile = $albumbase . '/' . $albumname . '/' . "photos.dat";
if( ! is_readable($picfile)) {
continue;
}
$picdata = unserialize(file_get_contents($picfile));
foreach($picdata as $id => $data) {
$name = $data->image->name;
$last_mod = $data->uploadDate;
$url = $baseurl . '/' . $albumname . '/' . $name;
$filemod = iso8601($last_mod);
printf("<url>\n\t<loc>%s</loc>\n\t<lastmod>%s</lastmod>\n</url>\n", $url, $filemod);
// photo URL is http://gallery.pachogrande.com/$albumname/$name
// photo modified as unix epoch is $last_mod
}
}
print('</urlset>');
RSS in a nutshell: <?php header("Content-Type: text/xml"); ?>
<?xml version="1.0"?>
<rss version="2.0">
channel
* title
* link
* description
* language, en-us
* pubDate
* lastBuildDate
* item (repeated item's to cover number of stories in the channel)
* title
* link
* description
* pubDate
* guid, isPermaLink="true" - content is same as link
</rss>