> Sitemaps XML format

Jan 22, 2009

Sitemaps XML format

Sitemap protocol

This post describes the XML schema for the Sitemap protocol.
The Sitemap protocol system consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.
The Sitemap must:

  • Begin with an opening <urlset> tag and end with a closing </urlset> tag.
  • Specify the namespace (protocol standard) within the <urlset> tag.
  • Include a <url> entry for each URL, as a parent XML tag.
  • Include a <loc> child entry for each <url> parent tag.

All other tags are voluntary. These voluntary tags may vary among search engines. Also, all URLs in a Sitemap must be from a single host, such as www.example.com or store.example.com.

Sample XML Sitemap

The following example shows a Sitemap that contains just one URL and uses all optional tags.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>

The available XML tags are described below.

Attribute Description
<urlset> required

Encapsulates the file and references the current protocol standard.

<url> required

Parent tag for each URL entry. The remaining tags are children of this tag.

<loc> required

URL of the page. This URL must begin with the protocol (such as http) and end witha trailing slash, if your web server requires it. This value must be less than 2,048characters.

<lastmod> optional

The date of last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion, ifdesired, and use YYYY-MM-DD.

Note that this tag is separate from the If-Modified-Since (304) header the servercan return, and search engines may use the information from both sources differently.

<changefreq> optional

How frequently the page is likely to change. This value provides general informationto search engines and may not correlate exactly to how often they crawl the page.Valid values are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

The value "always" should be used to describe documents that change each time theyare accessed. The value "never" should be used to describe archived URLs.

The value of this tag is considered a hint. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked "hourly" less frequently than that, and they may crawl pages marked "yearly" more frequently than that. Crawlers may periodically crawl pages marked "never" so that they can handle unexpected changes to those pages.

<priority> optional

The priority of this URL virtual to other URLs on your website. Valid values range from 0.0 to 1.0. This value does not involve how your webpages are compared to pages on other websites—it only lets the search engines know which pages you deem most important for the crawlers.

The default priority of a page is 0.5.

The priority you allocate to a webpage is not possible to influence the position of your URLs in a search engine's result pages. Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the possibility that your most important pages are present in a search index.

Also, please note that assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is virtual, it is only used to select between URLs on your site.

No comments:

Post a Comment

Free SEO Related Top Articles, Articles on SEO & SEM, Latest Articles on SEO Tips

 
Latest Topics on SEO, SEM Updates Free SEO Directory List Free SEO | SEM | Google Top Videos