How are the sitemaps of big MediaWiki sites generated?

by user8926   Last Updated April 06, 2015 17:01 PM - source

The question is related to my questions about things the filesystem of Wikipedia and generating sitemap for Wikipedia. It is more general.

Problem: Suppose a site on MediaWiki with about 10,000 pages is partly indexed by search engines. It has similar spects as the spects.

Question: How can you generate sitemaps for any big site to quarantee its visibility in search engines?

Answers 5

MediaWiki sites have all of their content in a relational database (RDBMS). The code for generating a site map basically just does SQL SELECT query to pull up the necessary information for every page. Probably doable in a single SQL query (that returns one row per page). The code for that is fairly simple, really.

Any large site that uses a content management system (CMS) will have an equally easy time generating a sitemap, even if there's a million pages. Query the database, format the results into the appropriate sitemap format. Pretty much the same kind of code as a search, but with one less WHERE clause (to return everything) and no pagination needed. The database type and schema can affect how easy this is, but in general a CMS will have the page name, URL (well, fields necessary to generate a URL), modification date and stuff like that as fields in the database.

This question and your other two make it seem like you don't really understand that MediaWiki sites uses a relational database, not a bunch of directories full of files.

Do you have a large site you're trying to generate sitemaps for? How is the data stored? Plain old-fashioned files on a filesystem?

July 12, 2009 23:55 PM

Most public sites only have a few "pages" as far as the developer is concerned.

Server Fault, for example, probably only consists of about 20 different pages. What this means is that large portions of site maps can be generated dynamically based on information in the back end database and then a few extra pages are added in statically.

Spencer Ruport
Spencer Ruport
July 12, 2009 23:56 PM

You have a few options.

If it is a large website built in-house, you would probably build your sitemaps based on database queries. You also have the option to "googlebot" yourself using various sitemap generators that will start on your homepage and crawl your entire website -- automatically building sitemap files.

We use this software for crawling and building sitemaps for some large websites:

Matt Beckman
Matt Beckman
March 02, 2011 19:37 PM

It's very simple.

php maintenance/generateSitemap.php \
   --fspath sitemap \
   --server \

See the generateSitemap.php manual for more information.

The DIY solutions suggested by the other answers are suboptimal.

April 06, 2015 08:50 AM

Related Questions

MediaWiki VCS (Git, Mercurial) backend?

Updated June 18, 2015 23:01 PM

Managing images on an open wiki platform

Updated March 20, 2017 04:04 AM

Why do Wikipedia's Infoboxes use Lua?

Updated March 14, 2016 08:01 AM

Mediawiki page 404 but listed in all pages

Updated August 13, 2016 08:03 AM