1 files changed, 299 insertions, 0 deletions
diff --git a/zarb-ml/mageia-sysadm/2012-April/004366.html b/zarb-ml/mageia-sysadm/2012-April/004366.html
new file mode 100644
index 000000000..ad85746a2
--- /dev/null
+++ b/zarb-ml/mageia-sysadm/2012-April/004366.html
@@ -0,0 +1,299 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+ <HEAD>
+   <TITLE> [Mageia-sysadm] questions about our infrastructure setup &amp; costs
+   </TITLE>
+   <LINK REL="Index" HREF="index.html" >
+   <LINK REL="made" HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C20120402190029.GA11367%40sisay.ephaone.org%3E">
+   <META NAME="robots" CONTENT="index,nofollow">
+   <META http-equiv="Content-Type" content="text/html; charset=us-ascii">
+   <LINK REL="Previous"  HREF="004359.html">
+   <LINK REL="Next"  HREF="004374.html">
+ </HEAD>
+ <BODY BGCOLOR="#ffffff">
+   <H1>[Mageia-sysadm] questions about our infrastructure setup &amp; costs</H1>
+    <B>Michael scherer</B> 
+    <A HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C20120402190029.GA11367%40sisay.ephaone.org%3E"
+       TITLE="[Mageia-sysadm] questions about our infrastructure setup &amp; costs">misc at zarb.org
+       </A><BR>
+    <I>Mon Apr  2 21:00:30 CEST 2012</I>
+    <P><UL>
+        <LI>Previous message: <A HREF="004359.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+        <LI>Next message: <A HREF="004374.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+         <LI> <B>Messages sorted by:</B> 
+              <a href="date.html#4366">[ date ]</a>
+              <a href="thread.html#4366">[ thread ]</a>
+              <a href="subject.html#4366">[ subject ]</a>
+              <a href="author.html#4366">[ author ]</a>
+         </LI>
+       </UL>
+    <HR>  
+<!--beginarticle-->
+<PRE>On Mon, Apr 02, 2012 at 06:02:48PM +0200, Romain d'Alverny wrote:
+&gt;<i> On Mon, Apr 2, 2012 at 16:59, Michael Scherer &lt;<A HREF="https://www.mageia.org/mailman/listinfo/mageia-sysadm">misc at zarb.org</A>&gt; wrote:
+</I>&gt;<i> &gt; Le lundi 02 avril 2012 &#224; 15:23 +0200, Romain d'Alverny a &#233;crit :
+</I>&gt;<i> &gt; That's a rather odd question, since with your treasurer hat, you should
+</I>&gt;<i> &gt; have all infos, so I do not really see what we can answer to that.
+</I>&gt;<i> 
+</I>&gt;<i> If I ask, it's that I don't. *\o/*
+</I>
+Well, you have the hardware we paid, no ? 
+The accounting is on the bank account, and despites requiring searching, it
+was published.
+
+If iyou need information on the hardware we got at the beggining, I can send 
+on this list a partial history of where 
+does every piece of hardware come from, if that what you need, but I prefer to 
+be sure that's really what you need before spending a afternoon on this.
+
+When they were setup, I proposed to deploy GLPI to have a inventory being done 
+automatically, but I was being told that a hand made should be sufficient, so I didn't. 
+And the hand made one didn't appear. 
+
+We can publish the yaml file from puppet, that can give enough information for someone
+wanting a inventory now, with mac address, bios information, serial number, memory, etc.
+Or just run some for loop with lshw and push it somewhere, so people can grasp what we 
+have for hardware.
+
+Or deploy glpi/fusion-inventory/etc, since they are packaged, if that's also the type
+of information you need.
+
+&gt;<i> And it would greatly appreciated that sysadmin decipher these things
+</I>&gt;<i> for a more accessible understanding of the infrastructure, not only
+</I>&gt;<i> for me, but for other future people that may not have your technical
+</I>&gt;<i> background.
+</I>
+I still do not know what answer.
+Basically :
+
+Ecosse and jonund are purely for the buildsystem.
+<A HREF="http://svnweb.mageia.org/adm/puppet/manifests/nodes/ecosse.pp?revision=2708&amp;view=markup">http://svnweb.mageia.org/adm/puppet/manifests/nodes/ecosse.pp?revision=2708&amp;view=markup</A>
+
+
+Alamut is where all our web applications are running.
+See the zone file, as this reflect exactly what is running where :
+<A HREF="http://svnweb.mageia.org/adm/puppet/deployment/dns/templates/mageia.org.zone?revision=2456&amp;view=markup">http://svnweb.mageia.org/adm/puppet/deployment/dns/templates/mageia.org.zone?revision=2456&amp;view=markup</A>
+
+the only exception are blog and planet, on krampouezh and champagne.
+( one with the mysql db, the other one with application =
+
+Champagne also hold most of the static websites.
+
+mailling list are on alamut too.
+postgresql is on alamut, so is the computing of our mail aliases
+( ie, postfix + spamassassin, etc )
+
+
+Valstar is controling the buildsystem ( job dispatch ), 
+serve as puppetmaster, host git, svn and the ldap master.
+
+
+So the buildsystem is roughly 3 servers + arm board, until we start to use postgresql in 
+which case we would have to take alamut in account.
+
+We can also take rabbit in account for the iso production. 
+
+&gt;<i> &gt; The list of servers is in puppet :
+</I>&gt;<i> &gt; <A HREF="http://svnweb.mageia.org/adm/puppet/manifests/nodes/">http://svnweb.mageia.org/adm/puppet/manifests/nodes/</A>
+</I>&gt;<i> &gt;
+</I>&gt;<i> &gt; and each has some module assigned to it, take this as functional chunk.
+</I>&gt;<i> &gt;
+</I>&gt;<i> &gt; Unfortunately, the servers are all doing more than one tasks, so
+</I>&gt;<i> &gt; splitting them in functional chunks do not mean much.
+</I>&gt;<i> 
+</I>&gt;<i> Yes it does. That's exactly this other view that I'm asking for.
+</I>&gt;<i> Because it makes sense to know what the &quot;buildsystem&quot; chunk/unit
+</I>&gt;<i> costs, as a whole, in storage/bandwidth/hosting options, in comparison
+</I>&gt;<i> with the &quot;user accounts&quot; one, with the &quot;Web sites&quot; one, with the
+</I>&gt;<i> &quot;mailing-lists&quot; one, etc. so that we can consider different hosting
+</I>&gt;<i> options for each of them.
+</I>
+We do not have accounting per chunk so the bandwidth would be aggregated.
+We can add accounting, but that would requires some iptables trick, and 
+we should let it run for 1 month to have proper data.
+
+For the housing, we need to have the size of each server ( I can provides 
+pictures of each server if someone want to count ) and the power consumption,
+and I do not know how to get this.
+
+Storage is the only thing we can get. 
+svn is 180 Go ( 82 on alamut, ie without the svn for binary )
+distrib tree is 550 Go
+bin repo is 100 Go
+
+the whole postgresql setup si 1 Go
+sympa archives are 5.8 Go
+
+the rest likely do not need to be counted for storage.
+
+And for the spec of the server, see start of this mail.
+
+&gt;<i> It makes sense too to have a dependencies graph at some level to
+</I>&gt;<i> quickly explain/see how our infrastructure components work with each
+</I>&gt;<i> other.
+</I>
+We do have done a not so bad job of mutualisation, so for example, 
+both bugzilla and sympa use the same ldap and the same postgresql.
+
+Splitting them would mean to have more work to do ( the current setup need some
+work to use more than a single sql server, for example, but could be done, there
+is some hook for that )
+
+&gt;<i> Because, these considerations need to be understood/thought out/taken
+</I>&gt;<i> also by people that are _not_ sysadmins, such as persons that may
+</I>&gt;<i> focus on organizing our financial resources or contacts or donations
+</I>&gt;<i> for that end.
+</I>&gt;<i>
+</I>&gt;<i> Think about this as &quot;enabling other community members to grasp what it
+</I>&gt;<i> takes to make Mageia.org function and to contribute means to that
+</I>&gt;<i> end&quot;.
+</I>&gt;<i>
+</I>&gt;<i> For instance, if we can have functional chunks, we may decide:
+</I>&gt;<i>  - which ones are critical (and should be moved to a solution that has
+</I>&gt;<i> the greatest possible availability): for instance LDAP, identity,
+</I>&gt;<i> mailing-lists, www, blogs;
+</I>&gt;<i>
+</I>&gt;<i>  - which ones may more safely shutdown without shutting down other
+</I>&gt;<i> independent services from a functional point of view; (for instance,
+</I>&gt;<i> buildsystem could go down without injuring the code repository itself,
+</I>&gt;<i> or the mailing-lists; code repositories may shut down too, if a later,
+</I>&gt;<i> read-only copy is available somewhere else
+</I>&gt;<i>  - which ones may be redundant and how.
+</I>
+So on the redundancy topic, there is 2 parts :
+- redundant hardware ( RAID, double power, double ethernet card, etc )
+- redundant software
+
+For the hardware part, we have several limitation, most notable one being 
+we do not have spare servers, nor warranty on most of them. And since they are far 
+away, the only documentation is photos I took that are currently on my phone.
+( cause we didn't had time to do everything that was planned last time and finished just in 
+time ).
+
+For the software part, most services would require a way to have a redundant file system.
+More than half of our services depend on a single filesystem. 
+- Puppetmaster depend on a sqlite database to work ( plan is to migrate to postgresql 
+sooner or later, as sqlite do not scale ). So for now, we cannot make it redundant.
+- identity depend on ldap write access, and we didn't set it up this way ( ie, there is 1 single master )
+- epoll depend on FS, and postgresql
+- bugzilla depend on FS ( for attachements ), and postgresql (mostly R/W)
+- transifex depend on FS ( for file ), and postgresql (R/W)
+- buildsystem depend on FS ( for the queing of jobs ), but builders are redundants
+- svn and git depend on a FS ( but there is a replica on alamut for viewvc ). 
+- viewvc can be made redundant without trouble
+- planet and blog depend on FS ( for pictures, cache ) and mysql. Planet can however
+be made redundant quite easily. 
+- mga::mirrors depend on postgresql, but readonly, so could be made redundant without trouble
+- sympa depend on FS and postgresql, and read/write
+- postfix can be made redundant, and is already ( could be improved however )
+- xymon depend on FS
+- mediawiki depend on FS (attachement ) and postgresql ( R/W )
+- youri report depend on postgresql, and is stateless so can be started from scratch again
+- pkgcpan is stateless too, so ca be moved somewhere else fast, like most static website
+( hugs, releases, www, static )
+- dns is fully redundant and can be deployed fast, provided puppetmaster is not impacted
+- main mirror can be duplicated, that's the goal of a mirror, but would requires contacting several people
+- maintdb depend on FS
+
+So we can :
+- make mga::mirrors redundant ( would be a good idea in fact )
+  - requires a redundant setup for postgresql
+  - round robin DNS
+  - make sure urpmi/drakx work fine in case of failure
+
+- make planet and viewvc redundant. Not so useful IMHO
+
+- improve postfix redundancy ( ie letting krampouezh distribute mail to aliases )
+  - would requires a small tweak on postfix module, but it currently a bit messy
+
+- make puppetmaster scalable and redundant
+  - need someone to fix thin, and then I can move sqlite to postgresql using 
+    some ruby script
+
+- improve identity/catdap/ldap setup to have it work in case of problem (
+  - applications should be tested for proper failover
+  - ldap should be set in a multi master setup
+
+but again, not sure if identity is the most important stuff to improve, and ldap 
+is already doubled, at least as readonly.
+
+And for everything that depend on FS, the problem is simple :
+- if the filesystem is toast, we cannot do much, except restore from backups
+- if we want to make them redundant, we need to make the FS redundant and shared, 
+and make sure the software support it. For example, i am not sure that svn would 
+work if the repository was shared over nfs ( maybe that changed ). Same goes for 
+sqlite.
+
+There is various way of doing this. We can either do it at the filesystem level 
+( lustre, gfs, ocfs2, gluster ), or export on nfs with some NAS ( like netapp ).
+Solution on the hardware level are expensive. Solution on the file system level 
+can be divided in 2 categoresy :
+- those I do not know
+- those where people told me they are cra^W^W have a lot of potential to be improved
+
+As a side note, I doubt we ship required userspace tools in Mageia for the 4 one I gave.
+
+One other solution would be to work on fast failure recovery, ie be able to reinstall a server fast
+( that's half done by puppet, backup restoration should be the other half ).
+
+Or to work in a way that would be less centralized ( for example and for the buildsystem, 
+using git instead of svn for packages, using a more scalable system than the current one, 
+for example, something based on amqp with redundant queues, etc ).
+There is distributed bugtracker ( ditz, SD, bugs everywhere, see <A HREF="https://lwn.net/Articles/281849/">https://lwn.net/Articles/281849/</A> ), 
+there is distributed wiki ( mostly based on git, bzr ), so we could have a different set of services
+that would work fine in case of failure. 
+
+&gt;<i> Yes, of course, some systems already split up like this (www, blog,
+</I>&gt;<i> ldap are not in Marseille here). But apart from sysadmin, no one
+</I>&gt;<i> knows. No one knows either what are the plans.
+</I>
+Because no one took the time to even search, or to post on the mailling list,
+which is sad.
+
+All current solutions for redundant setup I can think of would requires more
+than hardware, so no matter how long I explain the current setup, this would 
+unfortunately still requires someone with enough sysadmin skills to finish 
+the job.
+
+Hence the proposal <A HREF="https://wiki.mageia.org/en/SummerOfCode2012#Ease_the_integration_of_new_sysadmins_.28draft.29">https://wiki.mageia.org/en/SummerOfCode2012#Ease_the_integration_of_new_sysadmins_.28draft.29</A>
+
+
+&gt;<i> Splitting by function, dependency seems a good way to know what the
+</I>&gt;<i> system does, and how we can lay it out, not in one or two data
+</I>&gt;<i> centers, but more globally.
+</I>
+I do not say that's not a good idea to document ( I started to write 
+various SOP on the wiki ), just that I fear that no one will 
+be motivated to do it and to keep it up to date, especially for such a huge
+document.
+
+-- 
+Michael Scherer
+</PRE>
+
+
+
+
+
+<!--endarticle-->
+    <HR>
+    <P><UL>
+        <!--threads-->
+	<LI>Previous message: <A HREF="004359.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+	<LI>Next message: <A HREF="004374.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+         <LI> <B>Messages sorted by:</B> 
+              <a href="date.html#4366">[ date ]</a>
+              <a href="thread.html#4366">[ thread ]</a>
+              <a href="subject.html#4366">[ subject ]</a>
+              <a href="author.html#4366">[ author ]</a>
+         </LI>
+       </UL>
+
+<hr>
+<a href="https://www.mageia.org/mailman/listinfo/mageia-sysadm">More information about the Mageia-sysadm
+mailing list</a><br>
+</body></html>