1 files changed, 384 insertions, 0 deletions
diff --git a/zarb-ml/mageia-sysadm/2012-April/004369.html b/zarb-ml/mageia-sysadm/2012-April/004369.html
new file mode 100644
index 000000000..2d3511fc8
--- /dev/null
+++ b/zarb-ml/mageia-sysadm/2012-April/004369.html
@@ -0,0 +1,384 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+ <HEAD>
+   <TITLE> [Mageia-sysadm] questions about our infrastructure setup &amp; costs
+   </TITLE>
+   <LINK REL="Index" HREF="index.html" >
+   <LINK REL="made" HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C201204022213.00278.bgmilne%40zarb.org%3E">
+   <META NAME="robots" CONTENT="index,nofollow">
+   <META http-equiv="Content-Type" content="text/html; charset=us-ascii">
+   <LINK REL="Previous"  HREF="004393.html">
+   <LINK REL="Next"  HREF="004376.html">
+ </HEAD>
+ <BODY BGCOLOR="#ffffff">
+   <H1>[Mageia-sysadm] questions about our infrastructure setup &amp; costs</H1>
+    <B>Buchan Milne</B> 
+    <A HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C201204022213.00278.bgmilne%40zarb.org%3E"
+       TITLE="[Mageia-sysadm] questions about our infrastructure setup &amp; costs">bgmilne at zarb.org
+       </A><BR>
+    <I>Mon Apr  2 22:12:59 CEST 2012</I>
+    <P><UL>
+        <LI>Previous message: <A HREF="004393.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+        <LI>Next message: <A HREF="004376.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+         <LI> <B>Messages sorted by:</B> 
+              <a href="date.html#4369">[ date ]</a>
+              <a href="thread.html#4369">[ thread ]</a>
+              <a href="subject.html#4369">[ subject ]</a>
+              <a href="author.html#4369">[ author ]</a>
+         </LI>
+       </UL>
+    <HR>  
+<!--beginarticle-->
+<PRE>On Monday, 2 April 2012 16:59:59 Michael Scherer wrote:
+&gt;<i> Le lundi 02 avril 2012 &#224; 15:23 +0200, Romain d'Alverny a &#233;crit :
+</I>&gt;<i> &gt; Hi,
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; following past week-end incident, and I know that there are already
+</I>&gt;<i> &gt; some reflexions and discussions about that, I'm posting the following
+</I>&gt;<i> &gt; questions/needs, with my treasurer/board hat; some of these may
+</I>&gt;<i> &gt; already have answers, so please just link me to them.
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; It comes down to:
+</I>&gt;<i> &gt;  - board needs to have an up-to-date view of how much our
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; infrastructure costs, and would cost in different setups; and this,
+</I>&gt;<i> &gt; split in separate, functional chunks;
+</I>&gt;<i> 
+</I>&gt;<i> That's a rather odd question, since with your treasurer hat, you should
+</I>&gt;<i> have all infos, so I do not really see what we can answer to that.
+</I>&gt;<i> 
+</I>&gt;<i> The list of servers is in puppet :
+</I>&gt;<i> <A HREF="http://svnweb.mageia.org/adm/puppet/manifests/nodes/">http://svnweb.mageia.org/adm/puppet/manifests/nodes/</A>
+</I>&gt;<i> 
+</I>&gt;<i> and each has some module assigned to it, take this as functional chunk.
+</I>&gt;<i> Unfortunately, the servers are all doing more than one tasks, so
+</I>&gt;<i> splitting them in functional chunks do not mean much.
+</I>&gt;<i> 
+</I>&gt;<i> &gt;  - how can we change our setup to: 1) reduce the impact of having one
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; chunk (here a faulty RJ45 in Marseille) shut down so much of the
+</I>&gt;<i> &gt; project for such a long time and
+</I>&gt;<i> 
+</I>&gt;<i> That's easy to explain.
+</I>&gt;<i> 
+</I>&gt;<i> You identify each single point of failure, ( or spof ) and you make sure
+</I>&gt;<i> to remove the 'single' from SPOF by making it redundant.
+</I>&gt;<i> 
+</I>&gt;<i> For exemple, have 2 redundant power supply. Have 2 redundant ldap server
+</I>&gt;<i> ( we already do it ), have 2 redundant network connection.
+</I>&gt;<i> 
+</I>&gt;<i> Of course, the downside is that it cost twice the price ( at least ),
+</I>&gt;<i> and it is more complex.
+</I>&gt;<i> 
+</I>&gt;<i> Another solution is to try to increase the MTRR.
+</I>&gt;<i> 
+</I>&gt;<i> &gt;  2) have a quick report, automatic
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; about this (not only for sysadmin, but for all users of our
+</I>&gt;<i> &gt; infrastructure).
+</I>&gt;<i> 
+</I>&gt;<i> I do think for me that the current report of xymon are sufficient.
+</I>
+There is some room for enhancements, but that requires a bit more knowledge 
+regarding real (physical) dependencies than I have at present.
+
+I can maybe add some examples which don't require that knowledge, so others 
+can add some more.
+
+&gt;<i> &gt; So here is how I would put it:
+</I>&gt;<i> &gt;  A. could you, as sysadmin, draw (graphically) the dependencies
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; between services, at a certain functional scale + their current
+</I>&gt;<i> &gt; location/host;
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt;    * goal: have an overview of Mageia infrastructure, from the outside
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; of sysadmin team (and yes, again, that is needed);
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt;    * can we get it produced from the puppet conf? =&gt; the goal being
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; for now to have such a visual overview first, not to have it
+</I>&gt;<i> &gt; automated.
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt;    * the function blocks I can think of would be (but add/split/fix
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; accordingly):
+</I>&gt;<i> &gt;      + core for communication &amp; doc:
+</I>&gt;<i> &gt;        - user accounts (LDAP, identity.m.o)
+</I>&gt;<i> &gt;        - communications (mailing-lists, mail server)
+</I>&gt;<i> &gt;        - documentation (Wiki, Bugzilla)
+</I>&gt;<i> &gt;        - a specific code repository (not related to the build system)
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; for adm and/or one dedicated to organization (paperwork, reports,
+</I>&gt;<i> &gt; constitution, etc.)
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt;      + Web hosts (www, blog, planet, forums, security notifs, etc.)
+</I>&gt;<i> &gt;      + core for building the distribution
+</I>&gt;<i> &gt;      
+</I>&gt;<i> &gt;        - code repo
+</I>&gt;<i> &gt;        - buildsystem
+</I>&gt;<i> &gt;        - translation tools
+</I>&gt;<i> &gt;        - other?
+</I>&gt;<i> &gt;      
+</I>&gt;<i> &gt;      + core for distribution software
+</I>&gt;<i> &gt;      
+</I>&gt;<i> &gt;        - primary mirror
+</I>&gt;<i> &gt;      
+</I>&gt;<i> &gt;      + other?
+</I>&gt;<i> &gt;  
+</I>&gt;<i> &gt;  B. based on these functional chunks, for each, could you:
+</I>&gt;<i> &gt;   * document what is needed for them: storage, bandwidth, what it
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; represent in full hardware today, what it should grow to. Goals are:
+</I>&gt;<i> &gt;     - to have a clear idea of how much it represents/costs: today, or
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; if we would move to other hosting solutions (paid or not, hardware or
+</I>&gt;<i> &gt; virtual);
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt;     - to know how much we need to budget in security for these services;
+</I>&gt;<i> &gt;     - to know what our options (and needs) are for migrating some
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; services to an architecture or a paid solution that would improve
+</I>&gt;<i> &gt; their availability (and accessibility in case of failure).
+</I>&gt;<i> 
+</I>
+Note that moving some, or duplication some services may be significantly more 
+than twice the current cost, taking into account 'in-network' or 'in-cloud' 
+traffic costs vs. transit to another cloud/network.
+
+&gt;<i> so basically, if I take the price from OVH ( as they have a lot of
+</I>&gt;<i> choices and are rather cheap ) :
+</I>&gt;<i> 
+</I>&gt;<i> - alamut would cost around 84 e per month at ovh.fr. That's the closest
+</I>&gt;<i> server we can find in their offer.
+</I>&gt;<i> 
+</I>&gt;<i> - valstar has much more processors, ( 16 core ) and less ram, so let's
+</I>&gt;<i> evaluate this at 100e to 110e per month ( processor are more expensive
+</I>&gt;<i> than memory )
+</I>&gt;<i> 
+</I>&gt;<i> - ecosse would be around the same as alamut, but there is less ram so 70
+</I>&gt;<i> to 80 euros per month
+</I>&gt;<i> 
+</I>&gt;<i> - jonund has more processor so let's say too around 100 to 110e per
+</I>&gt;<i> month.
+</I>&gt;<i> 
+</I>&gt;<i> - fiona would like be 30 to 40 euros per month, given the price of
+</I>&gt;<i> Kimsufi ( cheaper servers from OVH )
+</I>&gt;<i> 
+</I>&gt;<i> - I cannot connect to sukuc from my bastion, so I do not know, but since
+</I>&gt;<i> that's a brand new server, let's say 80e per month.
+</I>&gt;<i> 
+</I>&gt;<i> As we cannot rent arm boards, let's assume that we will rent the space
+</I>&gt;<i> to host them.
+</I>&gt;<i> 
+</I>&gt;<i> Housing can be found in Paris for 300e :
+</I>&gt;<i> <A HREF="http://www.online.net/serveur-dedie/offre-dedibox-housing-dedirack.xhtml">http://www.online.net/serveur-dedie/offre-dedibox-housing-dedirack.xhtml</A>
+</I>&gt;<i> 
+</I>&gt;<i> since that's too much space for 2 arm board, I found a cheaper
+</I>&gt;<i> alternative :
+</I>&gt;<i> <A HREF="https://www.ovh.com/fr/housing/location_baie_1_a_3U.xml">https://www.ovh.com/fr/housing/location_baie_1_a_3U.xml</A>
+</I>&gt;<i> 99e
+</I>&gt;<i> 
+</I>&gt;<i> That make around 570 to 600 euros per month, for replacing the free
+</I>&gt;<i> hosting in LO with paid server, hosting them on one of the cheapest
+</I>&gt;<i> providers in the world. And for this price, we have of course no SSD on
+</I>&gt;<i> the builder ( there is some offer with small SSD, count 10 euros more
+</I>&gt;<i> per month and per server ) etc.
+</I>&gt;<i> 
+</I>&gt;<i> If we want to just host them in Paris, I think we can have for 600 euros
+</I>&gt;<i> per month, just for the housing, since we would use more than 3U ( I do
+</I>&gt;<i> not know exactly how much ).
+</I>&gt;<i> 
+</I>&gt;<i> People can feel free to redo the cost analysis on amazon EC2 or
+</I>&gt;<i> rackspace, I was not able to understand how much would alamut cost at
+</I>&gt;<i> rackspace ( not even if that's even possible to have a server where we
+</I>&gt;<i> are in charge ), and amazon ec2 pricing is to hosting what java is to my
+</I>&gt;<i> abacus.
+</I>&gt;<i> 
+</I>&gt;<i> And for being complete, I also searched random hosters around the
+</I>&gt;<i> world :
+</I>&gt;<i> 
+</I>&gt;<i> I found this
+</I>&gt;<i> <A HREF="http://www.razorservers.com/solutions/dedicated-servers/pricing/">http://www.razorservers.com/solutions/dedicated-servers/pricing/</A>
+</I>&gt;<i> so a server with the same spec as alamut is around 200$ for a more
+</I>&gt;<i> classic provider.
+</I>&gt;<i> 
+</I>&gt;<i> I found this
+</I>&gt;<i> <A HREF="http://www.server4you.com/root-server/server-details.php?products=3">http://www.server4you.com/root-server/server-details.php?products=3</A>
+</I>&gt;<i> would make 85$ ( since there is setup fee for each month ). Server4you
+</I>&gt;<i> is more like OVh.
+</I>&gt;<i> 
+</I>&gt;<i> and several others where the price is more around 150$ than 100$.
+</I>&gt;<i> 
+</I>&gt;<i> And of course, most of them have metered network connections that would
+</I>&gt;<i> maybe not be suitable for something like valstar, who act as a primary
+</I>&gt;<i> mirror. For reference, since we have started the server :
+</I>&gt;<i> 
+</I>&gt;<i> RX bytes:453228974131 (422.1 GiB)
+</I>&gt;<i> TX bytes:9311461347504 (8.4 TiB)
+</I>&gt;<i> 
+</I>&gt;<i> Uptime is 60 days.
+</I>&gt;<i> That's around 4 T per month of transfert.
+</I>
+How much of this is internal to the hosting provider?
+
+&gt;<i> 
+</I>&gt;<i> That's for alamut, to compare :
+</I>&gt;<i> RX bytes:30792994686 (28.6 GiB)
+</I>&gt;<i> TX bytes:215624995862 (200.8 GiB)
+</I>&gt;<i> 
+</I>&gt;<i> While hosters often propose &quot;unlimited transfer&quot;, most don't, and most
+</I>&gt;<i> use unlimited in the same way that phone providers do. So we need to be
+</I>&gt;<i> wary on this point if we want to go further in the cost analysis.
+</I>&gt;<i> 
+</I>&gt;<i> &gt;  C. various questions:
+</I>&gt;<i> &gt;   * could both above documentation (A and B) be maintained through
+</I>&gt;<i> &gt;   changes;
+</I>&gt;<i> 
+</I>&gt;<i> That depend on how they will be done, but I do not foresee someone
+</I>&gt;<i> volunteering for that, and since puppet informations are not sufficient
+</I>&gt;<i> to express that in a automated manner ( there is support for graphing
+</I>&gt;<i> deps between modules but not inter servers ), I doubt to see it being
+</I>&gt;<i> written soon.
+</I>&gt;<i> 
+</I>&gt;<i> Nagios do support doing some form of graphs, but we already have a
+</I>&gt;<i> working monitoring system, and there is some more important stuff to do
+</I>&gt;<i> before changing it ( for example, making sure that the current one is
+</I>&gt;<i> read by people by reducing the amount of crap sent on the ml, and this
+</I>&gt;<i> would requires someone fixing #4591, among others )
+</I>
+&quot;depends&quot; notation can be used to describe the dependencies between services 
+(including some logical tests). I have some scripts that draw diagrams with 
+near-real-time status (mostly network ones, e.g. links between manageable 
+switches, sometimes termed 'weather maps'), which could be extended to do some 
+automated diagrams based on the depends notation. I would actually like to 
+have this at work too, but at present I really only have (work) time to work 
+on 'official' projects that have project managers and budgets :-(.
+
+&gt;<i> 
+</I>&gt;<i> &gt;   * would it be possible to have the systems hosting our services to
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; have a prefix in their fqdn with the city/country they are located in?
+</I>&gt;<i> &gt; Goal: being more explicit about where a service is located at this
+</I>&gt;<i> &gt; time, so that a $ host www.mageia.org can answer me something like
+</I>&gt;<i> &gt; champagne.paris.fr.mageia.org - for instance. I don't mean to change
+</I>&gt;<i> &gt; all that, but I'm wondering about the opportunity.
+</I>&gt;<i> 
+</I>&gt;<i> What problem would it solve ?
+</I>
+Vs. what problems would it create. Note that many mail servers or anti-spam 
+systems score servers negatively for mismatch in forward/reverse records, and 
+email RFCs forbid pointing an MX at a CNAME.
+
+DNS isn't supposed to be a ITIL-compliant CMDB ...
+
+Network engineers (mistakenly in some cases, IMHO) put reverse DNS on router 
+interfaces to be able to easily understand traceroutes. My opinion is that 
+network segments (typically following VLANs) should be named, and network 
+segment should be used instead of interface name for non-point-to-point links 
+on routers.
+
+&gt;<i> The grouping of servers is already visible on xymon.mageia.org :
+</I>&gt;<i> <A HREF="http://xymon.mageia.org/xymon/servers/servers.html">http://xymon.mageia.org/xymon/servers/servers.html</A>
+</I>&gt;<i> 
+</I>&gt;<i> I pondered on adding support this in puppet for that, but in the end, I
+</I>&gt;<i> didn't found any good reason to do that for now ( would help if we have
+</I>&gt;<i> enough server, to setup ntp based on d-c, bastion server acl, etc, but
+</I>&gt;<i> we are not there yet ).
+</I>&gt;<i> 
+</I>&gt;<i> &gt;   * what do you think about maintaining a separate blog (for
+</I>&gt;<i> &gt; 
+</I>&gt;<i> &gt; opening/closing tickets + a global summary of what xymon provides
+</I>&gt;<i> &gt; already) under status.mageia.org (or maybe a different domain, for
+</I>&gt;<i> &gt; that matter)? (something similar to status.twitter.com)
+</I>&gt;<i> 
+</I>&gt;<i> Again, that solve none of our problems at all.
+</I>&gt;<i> 
+</I>&gt;<i> That solve a problem for a startup when they want to say &quot;we care about
+</I>&gt;<i> our customer, we give access to some form of monitoring&quot;, but we do
+</I>&gt;<i> already give full access to our monitoring, so that would be redundant.
+</I>&gt;<i> 
+</I>&gt;<i> Now, maybe the current access is not nice enough, and I am sure we can
+</I>&gt;<i> do some css work to enhance that, but as a aesthetic issue, I would not
+</I>&gt;<i> make this a priority.
+</I>&gt;<i> 
+</I>&gt;<i> And I have seen no one saying that the current blog is not enough. If
+</I>&gt;<i> people do not read it, they will not read another web site.
+</I>
+Can we decide on what problem(s) we are trying to resolve?
+
+A)That we should be able to know when our network connection is down?
+
+IMHO, the hosting provider should be monitoring this (our data center business 
+does this for all managed hosting customers). If the hosting provider is not 
+able to do that, then our options are:
+1)Monitor servers in one location from servers in another, and ensure that 
+they can inform us without requiring the servers in the first location to be 
+available
+2)Monitor the network interfaces etc. from inside the network, but have a non-
+network notification system, such as SMS modem, or old cell-phone (Nokia 5110 
+was the usual choice a few years ago). Alternatively, a 3G dongle for IP-based 
+access could be considered.
+
+(note that so far, there is minimal cost involved) 
+
+B)Ensure that a single network connection can fail without a whole site 
+failing.
+
+Cost: 2 manageable layer 3 switches with VRRP or HSRP (Cisco), an additional 
+port from the provider, and their work to implement their side. Cisco 3560 is 
+probably the best entry-level switch for this (but I haven't consulted a CCNP 
+or CCIE yet ...).
+
+C)That we should be able to continue development of the distribution when a 
+site has failed?
+The cost to implement this correctly/reliably is usually only justifiable for 
+a real-time commerce system (Bank, very high volume ecommerce site competing 
+with Amazon in a specific region, real-time billing system for a mobile phone 
+operator)
+
+D)That once we are aware of a failure, we are able to inform users of the 
+sytems of an outage.
+
+Beware that over-documenting things (we are doing ISO 20 000 and Business 
+Continuity efforts at work) do not necessarily result in a better system, they 
+just make sure that you have killed lots of trees making sure that someone 
+will be able to read (but not necessarily understand) what need to be done to 
+continue business in a disaster. The puppet-based config we have is superior 
+to what many companies have ... what we may rather need to do (if we can find 
+the time) is to hold some business continuity thought exercises.
+
+Regards,
+Buchan
+</PRE>
+
+
+
+
+
+
+
+
+
+<!--endarticle-->
+    <HR>
+    <P><UL>
+        <!--threads-->
+	<LI>Previous message: <A HREF="004393.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+	<LI>Next message: <A HREF="004376.html">[Mageia-sysadm] questions about our infrastructure setup &amp; costs
+</A></li>
+         <LI> <B>Messages sorted by:</B> 
+              <a href="date.html#4369">[ date ]</a>
+              <a href="thread.html#4369">[ thread ]</a>
+              <a href="subject.html#4369">[ subject ]</a>
+              <a href="author.html#4369">[ author ]</a>
+         </LI>
+       </UL>
+
+<hr>
+<a href="https://www.mageia.org/mailman/listinfo/mageia-sysadm">More information about the Mageia-sysadm
+mailing list</a><br>
+</body></html>