diff options
Diffstat (limited to 'zarb-ml/mageia-sysadm/2012-April/004366.html')
-rw-r--r-- | zarb-ml/mageia-sysadm/2012-April/004366.html | 299 |
1 files changed, 299 insertions, 0 deletions
diff --git a/zarb-ml/mageia-sysadm/2012-April/004366.html b/zarb-ml/mageia-sysadm/2012-April/004366.html new file mode 100644 index 000000000..ad85746a2 --- /dev/null +++ b/zarb-ml/mageia-sysadm/2012-April/004366.html @@ -0,0 +1,299 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> +<HTML> + <HEAD> + <TITLE> [Mageia-sysadm] questions about our infrastructure setup & costs + </TITLE> + <LINK REL="Index" HREF="index.html" > + <LINK REL="made" HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C20120402190029.GA11367%40sisay.ephaone.org%3E"> + <META NAME="robots" CONTENT="index,nofollow"> + <META http-equiv="Content-Type" content="text/html; charset=us-ascii"> + <LINK REL="Previous" HREF="004359.html"> + <LINK REL="Next" HREF="004374.html"> + </HEAD> + <BODY BGCOLOR="#ffffff"> + <H1>[Mageia-sysadm] questions about our infrastructure setup & costs</H1> + <B>Michael scherer</B> + <A HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C20120402190029.GA11367%40sisay.ephaone.org%3E" + TITLE="[Mageia-sysadm] questions about our infrastructure setup & costs">misc at zarb.org + </A><BR> + <I>Mon Apr 2 21:00:30 CEST 2012</I> + <P><UL> + <LI>Previous message: <A HREF="004359.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI>Next message: <A HREF="004374.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI> <B>Messages sorted by:</B> + <a href="date.html#4366">[ date ]</a> + <a href="thread.html#4366">[ thread ]</a> + <a href="subject.html#4366">[ subject ]</a> + <a href="author.html#4366">[ author ]</a> + </LI> + </UL> + <HR> +<!--beginarticle--> +<PRE>On Mon, Apr 02, 2012 at 06:02:48PM +0200, Romain d'Alverny wrote: +><i> On Mon, Apr 2, 2012 at 16:59, Michael Scherer <<A HREF="https://www.mageia.org/mailman/listinfo/mageia-sysadm">misc at zarb.org</A>> wrote: +</I>><i> > Le lundi 02 avril 2012 à 15:23 +0200, Romain d'Alverny a écrit : +</I>><i> > That's a rather odd question, since with your treasurer hat, you should +</I>><i> > have all infos, so I do not really see what we can answer to that. +</I>><i> +</I>><i> If I ask, it's that I don't. *\o/* +</I> +Well, you have the hardware we paid, no ? +The accounting is on the bank account, and despites requiring searching, it +was published. + +If iyou need information on the hardware we got at the beggining, I can send +on this list a partial history of where +does every piece of hardware come from, if that what you need, but I prefer to +be sure that's really what you need before spending a afternoon on this. + +When they were setup, I proposed to deploy GLPI to have a inventory being done +automatically, but I was being told that a hand made should be sufficient, so I didn't. +And the hand made one didn't appear. + +We can publish the yaml file from puppet, that can give enough information for someone +wanting a inventory now, with mac address, bios information, serial number, memory, etc. +Or just run some for loop with lshw and push it somewhere, so people can grasp what we +have for hardware. + +Or deploy glpi/fusion-inventory/etc, since they are packaged, if that's also the type +of information you need. + +><i> And it would greatly appreciated that sysadmin decipher these things +</I>><i> for a more accessible understanding of the infrastructure, not only +</I>><i> for me, but for other future people that may not have your technical +</I>><i> background. +</I> +I still do not know what answer. +Basically : + +Ecosse and jonund are purely for the buildsystem. +<A HREF="http://svnweb.mageia.org/adm/puppet/manifests/nodes/ecosse.pp?revision=2708&view=markup">http://svnweb.mageia.org/adm/puppet/manifests/nodes/ecosse.pp?revision=2708&view=markup</A> + + +Alamut is where all our web applications are running. +See the zone file, as this reflect exactly what is running where : +<A HREF="http://svnweb.mageia.org/adm/puppet/deployment/dns/templates/mageia.org.zone?revision=2456&view=markup">http://svnweb.mageia.org/adm/puppet/deployment/dns/templates/mageia.org.zone?revision=2456&view=markup</A> + +the only exception are blog and planet, on krampouezh and champagne. +( one with the mysql db, the other one with application = + +Champagne also hold most of the static websites. + +mailling list are on alamut too. +postgresql is on alamut, so is the computing of our mail aliases +( ie, postfix + spamassassin, etc ) + + +Valstar is controling the buildsystem ( job dispatch ), +serve as puppetmaster, host git, svn and the ldap master. + + +So the buildsystem is roughly 3 servers + arm board, until we start to use postgresql in +which case we would have to take alamut in account. + +We can also take rabbit in account for the iso production. + +><i> > The list of servers is in puppet : +</I>><i> > <A HREF="http://svnweb.mageia.org/adm/puppet/manifests/nodes/">http://svnweb.mageia.org/adm/puppet/manifests/nodes/</A> +</I>><i> > +</I>><i> > and each has some module assigned to it, take this as functional chunk. +</I>><i> > +</I>><i> > Unfortunately, the servers are all doing more than one tasks, so +</I>><i> > splitting them in functional chunks do not mean much. +</I>><i> +</I>><i> Yes it does. That's exactly this other view that I'm asking for. +</I>><i> Because it makes sense to know what the "buildsystem" chunk/unit +</I>><i> costs, as a whole, in storage/bandwidth/hosting options, in comparison +</I>><i> with the "user accounts" one, with the "Web sites" one, with the +</I>><i> "mailing-lists" one, etc. so that we can consider different hosting +</I>><i> options for each of them. +</I> +We do not have accounting per chunk so the bandwidth would be aggregated. +We can add accounting, but that would requires some iptables trick, and +we should let it run for 1 month to have proper data. + +For the housing, we need to have the size of each server ( I can provides +pictures of each server if someone want to count ) and the power consumption, +and I do not know how to get this. + +Storage is the only thing we can get. +svn is 180 Go ( 82 on alamut, ie without the svn for binary ) +distrib tree is 550 Go +bin repo is 100 Go + +the whole postgresql setup si 1 Go +sympa archives are 5.8 Go + +the rest likely do not need to be counted for storage. + +And for the spec of the server, see start of this mail. + +><i> It makes sense too to have a dependencies graph at some level to +</I>><i> quickly explain/see how our infrastructure components work with each +</I>><i> other. +</I> +We do have done a not so bad job of mutualisation, so for example, +both bugzilla and sympa use the same ldap and the same postgresql. + +Splitting them would mean to have more work to do ( the current setup need some +work to use more than a single sql server, for example, but could be done, there +is some hook for that ) + +><i> Because, these considerations need to be understood/thought out/taken +</I>><i> also by people that are _not_ sysadmins, such as persons that may +</I>><i> focus on organizing our financial resources or contacts or donations +</I>><i> for that end. +</I>><i> +</I>><i> Think about this as "enabling other community members to grasp what it +</I>><i> takes to make Mageia.org function and to contribute means to that +</I>><i> end". +</I>><i> +</I>><i> For instance, if we can have functional chunks, we may decide: +</I>><i> - which ones are critical (and should be moved to a solution that has +</I>><i> the greatest possible availability): for instance LDAP, identity, +</I>><i> mailing-lists, www, blogs; +</I>><i> +</I>><i> - which ones may more safely shutdown without shutting down other +</I>><i> independent services from a functional point of view; (for instance, +</I>><i> buildsystem could go down without injuring the code repository itself, +</I>><i> or the mailing-lists; code repositories may shut down too, if a later, +</I>><i> read-only copy is available somewhere else +</I>><i> - which ones may be redundant and how. +</I> +So on the redundancy topic, there is 2 parts : +- redundant hardware ( RAID, double power, double ethernet card, etc ) +- redundant software + +For the hardware part, we have several limitation, most notable one being +we do not have spare servers, nor warranty on most of them. And since they are far +away, the only documentation is photos I took that are currently on my phone. +( cause we didn't had time to do everything that was planned last time and finished just in +time ). + +For the software part, most services would require a way to have a redundant file system. +More than half of our services depend on a single filesystem. +- Puppetmaster depend on a sqlite database to work ( plan is to migrate to postgresql +sooner or later, as sqlite do not scale ). So for now, we cannot make it redundant. +- identity depend on ldap write access, and we didn't set it up this way ( ie, there is 1 single master ) +- epoll depend on FS, and postgresql +- bugzilla depend on FS ( for attachements ), and postgresql (mostly R/W) +- transifex depend on FS ( for file ), and postgresql (R/W) +- buildsystem depend on FS ( for the queing of jobs ), but builders are redundants +- svn and git depend on a FS ( but there is a replica on alamut for viewvc ). +- viewvc can be made redundant without trouble +- planet and blog depend on FS ( for pictures, cache ) and mysql. Planet can however +be made redundant quite easily. +- mga::mirrors depend on postgresql, but readonly, so could be made redundant without trouble +- sympa depend on FS and postgresql, and read/write +- postfix can be made redundant, and is already ( could be improved however ) +- xymon depend on FS +- mediawiki depend on FS (attachement ) and postgresql ( R/W ) +- youri report depend on postgresql, and is stateless so can be started from scratch again +- pkgcpan is stateless too, so ca be moved somewhere else fast, like most static website +( hugs, releases, www, static ) +- dns is fully redundant and can be deployed fast, provided puppetmaster is not impacted +- main mirror can be duplicated, that's the goal of a mirror, but would requires contacting several people +- maintdb depend on FS + +So we can : +- make mga::mirrors redundant ( would be a good idea in fact ) + - requires a redundant setup for postgresql + - round robin DNS + - make sure urpmi/drakx work fine in case of failure + +- make planet and viewvc redundant. Not so useful IMHO + +- improve postfix redundancy ( ie letting krampouezh distribute mail to aliases ) + - would requires a small tweak on postfix module, but it currently a bit messy + +- make puppetmaster scalable and redundant + - need someone to fix thin, and then I can move sqlite to postgresql using + some ruby script + +- improve identity/catdap/ldap setup to have it work in case of problem ( + - applications should be tested for proper failover + - ldap should be set in a multi master setup + +but again, not sure if identity is the most important stuff to improve, and ldap +is already doubled, at least as readonly. + +And for everything that depend on FS, the problem is simple : +- if the filesystem is toast, we cannot do much, except restore from backups +- if we want to make them redundant, we need to make the FS redundant and shared, +and make sure the software support it. For example, i am not sure that svn would +work if the repository was shared over nfs ( maybe that changed ). Same goes for +sqlite. + +There is various way of doing this. We can either do it at the filesystem level +( lustre, gfs, ocfs2, gluster ), or export on nfs with some NAS ( like netapp ). +Solution on the hardware level are expensive. Solution on the file system level +can be divided in 2 categoresy : +- those I do not know +- those where people told me they are cra^W^W have a lot of potential to be improved + +As a side note, I doubt we ship required userspace tools in Mageia for the 4 one I gave. + +One other solution would be to work on fast failure recovery, ie be able to reinstall a server fast +( that's half done by puppet, backup restoration should be the other half ). + +Or to work in a way that would be less centralized ( for example and for the buildsystem, +using git instead of svn for packages, using a more scalable system than the current one, +for example, something based on amqp with redundant queues, etc ). +There is distributed bugtracker ( ditz, SD, bugs everywhere, see <A HREF="https://lwn.net/Articles/281849/">https://lwn.net/Articles/281849/</A> ), +there is distributed wiki ( mostly based on git, bzr ), so we could have a different set of services +that would work fine in case of failure. + +><i> Yes, of course, some systems already split up like this (www, blog, +</I>><i> ldap are not in Marseille here). But apart from sysadmin, no one +</I>><i> knows. No one knows either what are the plans. +</I> +Because no one took the time to even search, or to post on the mailling list, +which is sad. + +All current solutions for redundant setup I can think of would requires more +than hardware, so no matter how long I explain the current setup, this would +unfortunately still requires someone with enough sysadmin skills to finish +the job. + +Hence the proposal <A HREF="https://wiki.mageia.org/en/SummerOfCode2012#Ease_the_integration_of_new_sysadmins_.28draft.29">https://wiki.mageia.org/en/SummerOfCode2012#Ease_the_integration_of_new_sysadmins_.28draft.29</A> + + +><i> Splitting by function, dependency seems a good way to know what the +</I>><i> system does, and how we can lay it out, not in one or two data +</I>><i> centers, but more globally. +</I> +I do not say that's not a good idea to document ( I started to write +various SOP on the wiki ), just that I fear that no one will +be motivated to do it and to keep it up to date, especially for such a huge +document. + +-- +Michael Scherer +</PRE> + + + + + +<!--endarticle--> + <HR> + <P><UL> + <!--threads--> + <LI>Previous message: <A HREF="004359.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI>Next message: <A HREF="004374.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI> <B>Messages sorted by:</B> + <a href="date.html#4366">[ date ]</a> + <a href="thread.html#4366">[ thread ]</a> + <a href="subject.html#4366">[ subject ]</a> + <a href="author.html#4366">[ author ]</a> + </LI> + </UL> + +<hr> +<a href="https://www.mageia.org/mailman/listinfo/mageia-sysadm">More information about the Mageia-sysadm +mailing list</a><br> +</body></html> |