Problem: Suppose a site on MediaWiki with about 10,000 pages is partly indexed by search engines. It has similar spects as the spects.
Question: How can you generate sitemaps for any big site to quarantee its visibility in search engines?
MediaWiki sites have all of their content in a relational database (RDBMS). The code for generating a site map basically just does SQL SELECT query to pull up the necessary information for every page. Probably doable in a single SQL query (that returns one row per page). The code for that is fairly simple, really.
Any large site that uses a content management system (CMS) will have an equally easy time generating a sitemap, even if there's a million pages. Query the database, format the results into the appropriate sitemap format. Pretty much the same kind of code as a search, but with one less WHERE clause (to return everything) and no pagination needed. The database type and schema can affect how easy this is, but in general a CMS will have the page name, URL (well, fields necessary to generate a URL), modification date and stuff like that as fields in the database.
This question and your other two make it seem like you don't really understand that MediaWiki sites uses a relational database, not a bunch of directories full of files.
Do you have a large site you're trying to generate sitemaps for? How is the data stored? Plain old-fashioned files on a filesystem?
Most public sites only have a few "pages" as far as the developer is concerned.
Server Fault, for example, probably only consists of about 20 different pages. What this means is that large portions of site maps can be generated dynamically based on information in the back end database and then a few extra pages are added in statically.
check this blog post
You have a few options.
If it is a large website built in-house, you would probably build your sitemaps based on database queries. You also have the option to "googlebot" yourself using various sitemap generators that will start on your homepage and crawl your entire website -- automatically building sitemap files.
We use this software for crawling and building sitemaps for some large websites:
It's very simple.
php maintenance/generateSitemap.php \ --fspath sitemap \ --server http://example.org \ --urlpath http://example.org/sitemap
See the generateSitemap.php manual for more information.
The DIY solutions suggested by the other answers are suboptimal.