diff options
Diffstat (limited to 'zarb-ml/mageia-sysadm/2012-April/004369.html')
-rw-r--r-- | zarb-ml/mageia-sysadm/2012-April/004369.html | 384 |
1 files changed, 384 insertions, 0 deletions
diff --git a/zarb-ml/mageia-sysadm/2012-April/004369.html b/zarb-ml/mageia-sysadm/2012-April/004369.html new file mode 100644 index 000000000..2d3511fc8 --- /dev/null +++ b/zarb-ml/mageia-sysadm/2012-April/004369.html @@ -0,0 +1,384 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> +<HTML> + <HEAD> + <TITLE> [Mageia-sysadm] questions about our infrastructure setup & costs + </TITLE> + <LINK REL="Index" HREF="index.html" > + <LINK REL="made" HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C201204022213.00278.bgmilne%40zarb.org%3E"> + <META NAME="robots" CONTENT="index,nofollow"> + <META http-equiv="Content-Type" content="text/html; charset=us-ascii"> + <LINK REL="Previous" HREF="004393.html"> + <LINK REL="Next" HREF="004376.html"> + </HEAD> + <BODY BGCOLOR="#ffffff"> + <H1>[Mageia-sysadm] questions about our infrastructure setup & costs</H1> + <B>Buchan Milne</B> + <A HREF="mailto:mageia-sysadm%40mageia.org?Subject=Re%3A%20%5BMageia-sysadm%5D%20questions%20about%20our%20infrastructure%20setup%20%26%20costs&In-Reply-To=%3C201204022213.00278.bgmilne%40zarb.org%3E" + TITLE="[Mageia-sysadm] questions about our infrastructure setup & costs">bgmilne at zarb.org + </A><BR> + <I>Mon Apr 2 22:12:59 CEST 2012</I> + <P><UL> + <LI>Previous message: <A HREF="004393.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI>Next message: <A HREF="004376.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI> <B>Messages sorted by:</B> + <a href="date.html#4369">[ date ]</a> + <a href="thread.html#4369">[ thread ]</a> + <a href="subject.html#4369">[ subject ]</a> + <a href="author.html#4369">[ author ]</a> + </LI> + </UL> + <HR> +<!--beginarticle--> +<PRE>On Monday, 2 April 2012 16:59:59 Michael Scherer wrote: +><i> Le lundi 02 avril 2012 à 15:23 +0200, Romain d'Alverny a écrit : +</I>><i> > Hi, +</I>><i> > +</I>><i> > following past week-end incident, and I know that there are already +</I>><i> > some reflexions and discussions about that, I'm posting the following +</I>><i> > questions/needs, with my treasurer/board hat; some of these may +</I>><i> > already have answers, so please just link me to them. +</I>><i> > +</I>><i> > It comes down to: +</I>><i> > - board needs to have an up-to-date view of how much our +</I>><i> > +</I>><i> > infrastructure costs, and would cost in different setups; and this, +</I>><i> > split in separate, functional chunks; +</I>><i> +</I>><i> That's a rather odd question, since with your treasurer hat, you should +</I>><i> have all infos, so I do not really see what we can answer to that. +</I>><i> +</I>><i> The list of servers is in puppet : +</I>><i> <A HREF="http://svnweb.mageia.org/adm/puppet/manifests/nodes/">http://svnweb.mageia.org/adm/puppet/manifests/nodes/</A> +</I>><i> +</I>><i> and each has some module assigned to it, take this as functional chunk. +</I>><i> Unfortunately, the servers are all doing more than one tasks, so +</I>><i> splitting them in functional chunks do not mean much. +</I>><i> +</I>><i> > - how can we change our setup to: 1) reduce the impact of having one +</I>><i> > +</I>><i> > chunk (here a faulty RJ45 in Marseille) shut down so much of the +</I>><i> > project for such a long time and +</I>><i> +</I>><i> That's easy to explain. +</I>><i> +</I>><i> You identify each single point of failure, ( or spof ) and you make sure +</I>><i> to remove the 'single' from SPOF by making it redundant. +</I>><i> +</I>><i> For exemple, have 2 redundant power supply. Have 2 redundant ldap server +</I>><i> ( we already do it ), have 2 redundant network connection. +</I>><i> +</I>><i> Of course, the downside is that it cost twice the price ( at least ), +</I>><i> and it is more complex. +</I>><i> +</I>><i> Another solution is to try to increase the MTRR. +</I>><i> +</I>><i> > 2) have a quick report, automatic +</I>><i> > +</I>><i> > about this (not only for sysadmin, but for all users of our +</I>><i> > infrastructure). +</I>><i> +</I>><i> I do think for me that the current report of xymon are sufficient. +</I> +There is some room for enhancements, but that requires a bit more knowledge +regarding real (physical) dependencies than I have at present. + +I can maybe add some examples which don't require that knowledge, so others +can add some more. + +><i> > So here is how I would put it: +</I>><i> > A. could you, as sysadmin, draw (graphically) the dependencies +</I>><i> > +</I>><i> > between services, at a certain functional scale + their current +</I>><i> > location/host; +</I>><i> > +</I>><i> > * goal: have an overview of Mageia infrastructure, from the outside +</I>><i> > +</I>><i> > of sysadmin team (and yes, again, that is needed); +</I>><i> > +</I>><i> > * can we get it produced from the puppet conf? => the goal being +</I>><i> > +</I>><i> > for now to have such a visual overview first, not to have it +</I>><i> > automated. +</I>><i> > +</I>><i> > * the function blocks I can think of would be (but add/split/fix +</I>><i> > +</I>><i> > accordingly): +</I>><i> > + core for communication & doc: +</I>><i> > - user accounts (LDAP, identity.m.o) +</I>><i> > - communications (mailing-lists, mail server) +</I>><i> > - documentation (Wiki, Bugzilla) +</I>><i> > - a specific code repository (not related to the build system) +</I>><i> > +</I>><i> > for adm and/or one dedicated to organization (paperwork, reports, +</I>><i> > constitution, etc.) +</I>><i> > +</I>><i> > + Web hosts (www, blog, planet, forums, security notifs, etc.) +</I>><i> > + core for building the distribution +</I>><i> > +</I>><i> > - code repo +</I>><i> > - buildsystem +</I>><i> > - translation tools +</I>><i> > - other? +</I>><i> > +</I>><i> > + core for distribution software +</I>><i> > +</I>><i> > - primary mirror +</I>><i> > +</I>><i> > + other? +</I>><i> > +</I>><i> > B. based on these functional chunks, for each, could you: +</I>><i> > * document what is needed for them: storage, bandwidth, what it +</I>><i> > +</I>><i> > represent in full hardware today, what it should grow to. Goals are: +</I>><i> > - to have a clear idea of how much it represents/costs: today, or +</I>><i> > +</I>><i> > if we would move to other hosting solutions (paid or not, hardware or +</I>><i> > virtual); +</I>><i> > +</I>><i> > - to know how much we need to budget in security for these services; +</I>><i> > - to know what our options (and needs) are for migrating some +</I>><i> > +</I>><i> > services to an architecture or a paid solution that would improve +</I>><i> > their availability (and accessibility in case of failure). +</I>><i> +</I> +Note that moving some, or duplication some services may be significantly more +than twice the current cost, taking into account 'in-network' or 'in-cloud' +traffic costs vs. transit to another cloud/network. + +><i> so basically, if I take the price from OVH ( as they have a lot of +</I>><i> choices and are rather cheap ) : +</I>><i> +</I>><i> - alamut would cost around 84 e per month at ovh.fr. That's the closest +</I>><i> server we can find in their offer. +</I>><i> +</I>><i> - valstar has much more processors, ( 16 core ) and less ram, so let's +</I>><i> evaluate this at 100e to 110e per month ( processor are more expensive +</I>><i> than memory ) +</I>><i> +</I>><i> - ecosse would be around the same as alamut, but there is less ram so 70 +</I>><i> to 80 euros per month +</I>><i> +</I>><i> - jonund has more processor so let's say too around 100 to 110e per +</I>><i> month. +</I>><i> +</I>><i> - fiona would like be 30 to 40 euros per month, given the price of +</I>><i> Kimsufi ( cheaper servers from OVH ) +</I>><i> +</I>><i> - I cannot connect to sukuc from my bastion, so I do not know, but since +</I>><i> that's a brand new server, let's say 80e per month. +</I>><i> +</I>><i> As we cannot rent arm boards, let's assume that we will rent the space +</I>><i> to host them. +</I>><i> +</I>><i> Housing can be found in Paris for 300e : +</I>><i> <A HREF="http://www.online.net/serveur-dedie/offre-dedibox-housing-dedirack.xhtml">http://www.online.net/serveur-dedie/offre-dedibox-housing-dedirack.xhtml</A> +</I>><i> +</I>><i> since that's too much space for 2 arm board, I found a cheaper +</I>><i> alternative : +</I>><i> <A HREF="https://www.ovh.com/fr/housing/location_baie_1_a_3U.xml">https://www.ovh.com/fr/housing/location_baie_1_a_3U.xml</A> +</I>><i> 99e +</I>><i> +</I>><i> That make around 570 to 600 euros per month, for replacing the free +</I>><i> hosting in LO with paid server, hosting them on one of the cheapest +</I>><i> providers in the world. And for this price, we have of course no SSD on +</I>><i> the builder ( there is some offer with small SSD, count 10 euros more +</I>><i> per month and per server ) etc. +</I>><i> +</I>><i> If we want to just host them in Paris, I think we can have for 600 euros +</I>><i> per month, just for the housing, since we would use more than 3U ( I do +</I>><i> not know exactly how much ). +</I>><i> +</I>><i> People can feel free to redo the cost analysis on amazon EC2 or +</I>><i> rackspace, I was not able to understand how much would alamut cost at +</I>><i> rackspace ( not even if that's even possible to have a server where we +</I>><i> are in charge ), and amazon ec2 pricing is to hosting what java is to my +</I>><i> abacus. +</I>><i> +</I>><i> And for being complete, I also searched random hosters around the +</I>><i> world : +</I>><i> +</I>><i> I found this +</I>><i> <A HREF="http://www.razorservers.com/solutions/dedicated-servers/pricing/">http://www.razorservers.com/solutions/dedicated-servers/pricing/</A> +</I>><i> so a server with the same spec as alamut is around 200$ for a more +</I>><i> classic provider. +</I>><i> +</I>><i> I found this +</I>><i> <A HREF="http://www.server4you.com/root-server/server-details.php?products=3">http://www.server4you.com/root-server/server-details.php?products=3</A> +</I>><i> would make 85$ ( since there is setup fee for each month ). Server4you +</I>><i> is more like OVh. +</I>><i> +</I>><i> and several others where the price is more around 150$ than 100$. +</I>><i> +</I>><i> And of course, most of them have metered network connections that would +</I>><i> maybe not be suitable for something like valstar, who act as a primary +</I>><i> mirror. For reference, since we have started the server : +</I>><i> +</I>><i> RX bytes:453228974131 (422.1 GiB) +</I>><i> TX bytes:9311461347504 (8.4 TiB) +</I>><i> +</I>><i> Uptime is 60 days. +</I>><i> That's around 4 T per month of transfert. +</I> +How much of this is internal to the hosting provider? + +><i> +</I>><i> That's for alamut, to compare : +</I>><i> RX bytes:30792994686 (28.6 GiB) +</I>><i> TX bytes:215624995862 (200.8 GiB) +</I>><i> +</I>><i> While hosters often propose "unlimited transfer", most don't, and most +</I>><i> use unlimited in the same way that phone providers do. So we need to be +</I>><i> wary on this point if we want to go further in the cost analysis. +</I>><i> +</I>><i> > C. various questions: +</I>><i> > * could both above documentation (A and B) be maintained through +</I>><i> > changes; +</I>><i> +</I>><i> That depend on how they will be done, but I do not foresee someone +</I>><i> volunteering for that, and since puppet informations are not sufficient +</I>><i> to express that in a automated manner ( there is support for graphing +</I>><i> deps between modules but not inter servers ), I doubt to see it being +</I>><i> written soon. +</I>><i> +</I>><i> Nagios do support doing some form of graphs, but we already have a +</I>><i> working monitoring system, and there is some more important stuff to do +</I>><i> before changing it ( for example, making sure that the current one is +</I>><i> read by people by reducing the amount of crap sent on the ml, and this +</I>><i> would requires someone fixing #4591, among others ) +</I> +"depends" notation can be used to describe the dependencies between services +(including some logical tests). I have some scripts that draw diagrams with +near-real-time status (mostly network ones, e.g. links between manageable +switches, sometimes termed 'weather maps'), which could be extended to do some +automated diagrams based on the depends notation. I would actually like to +have this at work too, but at present I really only have (work) time to work +on 'official' projects that have project managers and budgets :-(. + +><i> +</I>><i> > * would it be possible to have the systems hosting our services to +</I>><i> > +</I>><i> > have a prefix in their fqdn with the city/country they are located in? +</I>><i> > Goal: being more explicit about where a service is located at this +</I>><i> > time, so that a $ host www.mageia.org can answer me something like +</I>><i> > champagne.paris.fr.mageia.org - for instance. I don't mean to change +</I>><i> > all that, but I'm wondering about the opportunity. +</I>><i> +</I>><i> What problem would it solve ? +</I> +Vs. what problems would it create. Note that many mail servers or anti-spam +systems score servers negatively for mismatch in forward/reverse records, and +email RFCs forbid pointing an MX at a CNAME. + +DNS isn't supposed to be a ITIL-compliant CMDB ... + +Network engineers (mistakenly in some cases, IMHO) put reverse DNS on router +interfaces to be able to easily understand traceroutes. My opinion is that +network segments (typically following VLANs) should be named, and network +segment should be used instead of interface name for non-point-to-point links +on routers. + +><i> The grouping of servers is already visible on xymon.mageia.org : +</I>><i> <A HREF="http://xymon.mageia.org/xymon/servers/servers.html">http://xymon.mageia.org/xymon/servers/servers.html</A> +</I>><i> +</I>><i> I pondered on adding support this in puppet for that, but in the end, I +</I>><i> didn't found any good reason to do that for now ( would help if we have +</I>><i> enough server, to setup ntp based on d-c, bastion server acl, etc, but +</I>><i> we are not there yet ). +</I>><i> +</I>><i> > * what do you think about maintaining a separate blog (for +</I>><i> > +</I>><i> > opening/closing tickets + a global summary of what xymon provides +</I>><i> > already) under status.mageia.org (or maybe a different domain, for +</I>><i> > that matter)? (something similar to status.twitter.com) +</I>><i> +</I>><i> Again, that solve none of our problems at all. +</I>><i> +</I>><i> That solve a problem for a startup when they want to say "we care about +</I>><i> our customer, we give access to some form of monitoring", but we do +</I>><i> already give full access to our monitoring, so that would be redundant. +</I>><i> +</I>><i> Now, maybe the current access is not nice enough, and I am sure we can +</I>><i> do some css work to enhance that, but as a aesthetic issue, I would not +</I>><i> make this a priority. +</I>><i> +</I>><i> And I have seen no one saying that the current blog is not enough. If +</I>><i> people do not read it, they will not read another web site. +</I> +Can we decide on what problem(s) we are trying to resolve? + +A)That we should be able to know when our network connection is down? + +IMHO, the hosting provider should be monitoring this (our data center business +does this for all managed hosting customers). If the hosting provider is not +able to do that, then our options are: +1)Monitor servers in one location from servers in another, and ensure that +they can inform us without requiring the servers in the first location to be +available +2)Monitor the network interfaces etc. from inside the network, but have a non- +network notification system, such as SMS modem, or old cell-phone (Nokia 5110 +was the usual choice a few years ago). Alternatively, a 3G dongle for IP-based +access could be considered. + +(note that so far, there is minimal cost involved) + +B)Ensure that a single network connection can fail without a whole site +failing. + +Cost: 2 manageable layer 3 switches with VRRP or HSRP (Cisco), an additional +port from the provider, and their work to implement their side. Cisco 3560 is +probably the best entry-level switch for this (but I haven't consulted a CCNP +or CCIE yet ...). + +C)That we should be able to continue development of the distribution when a +site has failed? +The cost to implement this correctly/reliably is usually only justifiable for +a real-time commerce system (Bank, very high volume ecommerce site competing +with Amazon in a specific region, real-time billing system for a mobile phone +operator) + +D)That once we are aware of a failure, we are able to inform users of the +sytems of an outage. + +Beware that over-documenting things (we are doing ISO 20 000 and Business +Continuity efforts at work) do not necessarily result in a better system, they +just make sure that you have killed lots of trees making sure that someone +will be able to read (but not necessarily understand) what need to be done to +continue business in a disaster. The puppet-based config we have is superior +to what many companies have ... what we may rather need to do (if we can find +the time) is to hold some business continuity thought exercises. + +Regards, +Buchan +</PRE> + + + + + + + + + +<!--endarticle--> + <HR> + <P><UL> + <!--threads--> + <LI>Previous message: <A HREF="004393.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI>Next message: <A HREF="004376.html">[Mageia-sysadm] questions about our infrastructure setup & costs +</A></li> + <LI> <B>Messages sorted by:</B> + <a href="date.html#4369">[ date ]</a> + <a href="thread.html#4369">[ thread ]</a> + <a href="subject.html#4369">[ subject ]</a> + <a href="author.html#4369">[ author ]</a> + </LI> + </UL> + +<hr> +<a href="https://www.mageia.org/mailman/listinfo/mageia-sysadm">More information about the Mageia-sysadm +mailing list</a><br> +</body></html> |