aboutsummaryrefslogtreecommitdiffstats
path: root/README.BINREPO
diff options
context:
space:
mode:
Diffstat (limited to 'README.BINREPO')
-rw-r--r--README.BINREPO360
1 files changed, 360 insertions, 0 deletions
diff --git a/README.BINREPO b/README.BINREPO
new file mode 100644
index 0000000..78b951d
--- /dev/null
+++ b/README.BINREPO
@@ -0,0 +1,360 @@
+================================
+The detached binaries repository
+================================
+
+.. contents::
+
+A brief description
+===================
+
+Ideally, all binaries from packages sources (ie. all the binary files inside
+SOURCES/) will be placed in another subversion repository. This repository
+is called "tarballs repository", "binaries repository" or just "binrepo".
+It will contain mostly the same directory structure of the main repository,
+but instead of having SOURCES and SPECS, it will only have a SOURCES
+directory. Every copy/move operation should happen in both repositories.
+
+In order to allow deceasing binaries from older distributions, each stable
+distro will have its own subversion repository for binary files. mgarepo
+knows how to access these binrepos by checking which URL defined in the
+"[binrepo]" section of the configuration file matches the path-part of the
+repository being accessed. (see open issues)
+
+The package changelogs will be generated from SVN commit logs in the main
+"plaintext" repository ("txtrepo" for short) only. Old changelogs will be
+preserved, as even empty revisions are preserved in the binaries-filtering
+conversion.
+
+
+Mapping repositories states
+---------------------------
+
+In order to allow the use of `mgarepo {getsrpm,co} -r REV`, mgarepo will have
+to use a reference in the text repo which will be used to know in what
+state was the binrepo when a binary was uploaded.
+
+We cannot use direct revision number mapping through properties/files/etc
+mainly because we may have multiple binaries repositories, and eventually
+they can be filtered for reducing space, thus can't ensure revisions will
+survive. Thus another mechanism which relies on dates instead of revisions
+numbers is needed.
+
+When a binary is uploaded to the binrepo, the file `sha1.lst` is updated to
+have the files's hash and commited in the main text repo. This file will be
+used as the reference when the user uses -r REV on mgarepo. mgarepo will
+checkout the package in the main text repo with -r REV and then will use
+the "Last Changed Date" of `sha1.lst` to checkout the binrepo part. Thus,
+`sha1.lst` should be always commited to the main text repository *after* the
+corresponding binary files have been commited to the binrepo. Hooks in the
+main repository may be used to try to enforce this, by checking if the files
+changed in `sha1.lst` are already commited in the corresponding binrepo.
+
+Computation of `sha1.lst` is unlikely to be an issue:
+
+- it should not happen too often for any given package
+- it takes[0] less than 10s to sha1sum all SOURCES of openoffice.org-3.1-1mdv2010.0.src.rpm
+- it probably takes way less than the time to upload the file into the repository
+- it can be computed in parallel to the binrepo commit, and probably finish
+ before that, thus ready by the time `sha1.lst` should be commited
+- users don't need to verify the SHA1s "everytime", but the build system
+ does, thus Repsys can default to not verify and avoid wasting users' time
+
+The use of `sha1.lst` has the valuable property of tying the state of the main
+repository and the binrepo. With it, at getsrpm time of a package
+submission we can verify the SHA1 of the SOURCES-bin, and be sure that
+either the package will be built with the expected state, or early fail the
+build. It also allows for verifying binaries without trusting the binrepo,
+which may be useful if we consider using an unversioned plain filesystem
+storage in the future (for old distros or whatever), or at "client side",
+which maintainers may find useful.
+
+[0]: In a single core AMD Athlon(tm) 3800+ (2400Mhz)
+
+Mapping of revisions using SVN properties
+-----------------------------------------
+
+Alternatively to using the above "sha1.lst scheme", the revision mapping
+between the main repository and a binrepo could be done using subversion
+properties. This could be done by making every commit to binrepos also
+cause a corresponding commit in the main text repository to happen, which
+would update a property recording the current date. That is, a subversion
+property in the main text repository would be kept, such that for any given
+main repository revision, the corresponding state of the binrepos is
+obtainable (using the registered date).
+
+This would be "more transparent", as it can be maintened simply by using
+subversion hooks, without user intervention. OTOH, as every time the user
+commits to a binrepo this would result in a commit in the main repository,
+it would require the user to "svn up" the directories from there before
+commiting, after every binrepo commit. Also, this might result in a big
+number of "bogus" commits to the main repository, which could be seen as log
+pollution, and may potentially increase space usage etc..
+
+Why a new repository without the tarballs
+==========================================
+
+- the current svn repository is too large, hard to manage
+- big binary files (in general, "tarballs") history is of little value in
+ the distro development, we care much more about our specs, patches,
+ configurations, etc.; nonetheless, those big files we don't care much for
+ take the most resources and make backups and restoration in case of
+ failure very expensive, much more so than the more valuable data
+- there is no easy way to strip undesired tarballs without recreating the
+ whole repository
+- fedora and ubuntu have separated repositories, so we must have it too!
+
+Numbers
+-------
+
+Current repository is +390000 revisions and ~340Gb big, while the bzip2ed
+dumps backup for it takes about a bit more than half that size (FIXME:
+estimative, can't check in the backup server right now). Current txtrepo
+with the same number of revisions is ~180Gb big, takes about 2-3 days to be
+imported, while the gzipped full dump backup for it currently takes ~1.2Gb.
+Initial binrepo for Cooker (only `current/` packages' branches) took ~28Gb
+in disk, gzipped full dump for it takes ~25Gb, took about 5h30m to be
+populated from the current in use repository ("oldrepo").
+
+
+Drawbacks of this layout
+=========================
+
+- (always) everything that changes the single-repository usage increases the chance
+ of failure and make things more complicated.
+- subversion can't be used alone as easily as the current scheme allows
+- copying binaries between distro branches may not be "svn-cheap" anymore
+ (unless they're in the same binrepo)
+- ...
+
+
+Open issues
+============
+
+Multiple binrepos dont allow us to have one permanent URL
+---------------------------------------------------------
+
+We would have to update the configuration files from all the users in order
+to add a new stable repository. spuk suggests to use properties in the main
+text repo that would point to the right repository locations.
+
+How to handle failures when operating on more repositores?
+----------------------------------------------------------
+
+binrepos should replicate the structure of the main text repo. What we
+should do if the markrelease succeeds in the binrepo, but fails in the main
+text repo?
+
+R: Markrelease must be done first in the txtrepo. If it fails there "we're
+in trouble" (though currently, we just miss it[0]). When the markrelease is
+done in the txtrepo, we can do markrelease in the binrepo using '-r {DATE}',
+using the markrelease date in the txtrepo as '{DATE}'.
+
+[0] We should add transaction support for markrelease. The transaction could
+be stored out of the packages SVN (another SVN, a DB, a txt file, etc.), and
+would work like:
+
+0. mark beginning of markrelease, early failing the package build if it fails
+1. do markrelease
+2. mark sucessful end of markrelease
+ or mark failed markrelease, so we can replay it later
+
+
+Interesting use cases (first phase)
+===================================
+
+mgarepo co 1/mutt
+---------------------
+
+- mgarepo checkouts
+ http://svn.mageia.org/svn/packages/updates/1/mutt/current to the
+ mutt directory
+
+- mgarepo checkouts
+ http://svn.mageia.org/svn/binrepo/updates/1/mutt/current/SOURCES
+ into mutt/SOURCES-bin
+
+- creates symlinks for all files found in SOURCES-bin/ into ../SOURCES/
+
+ (rpm doesn't handle symlinks, this allows us to have explicit links and
+ proper src.rpm generates by rpmbuild)
+
+In case the path doesn't exist in the binrepo it will not fail, as we may
+have not imported all packages or the repository is not prepared to work on
+this model, etc.
+
+markrelease of a package
+------------------------
+
+::
+
+ $ mgarepo markrelease
+
+- will copy current/ to releases/VERSION/RELEASE, as usual
+
+- will copy current/ to releases/, on the binrepo too
+
+Optionally, markrelease could create revprops indicating which is the
+revision of current/ on the binrepo that represents the tarballs that are
+being tagged.
+
+
+Use cases to be implemented after the first phase
+=================================================
+
+upgrading to a newer version of the package
+-------------------------------------------
+
+::
+
+ $ cd bla/SOURCES/
+ $ wget http://prdownloads.sourceforge.net/bla/bla-1.6.tar.bz2
+ $ mgarepo upload bla-1.6.0.tar.bz2
+
+- mgarepo notices this is a tarball (checking filename and/or file size)
+
+- mgarepo will move the file to SOURCES-bin/, create the symlink, and svn-add
+ it to the working copy
+
+ $ # the user updates the spec
+
+ $ mgarepo rm SOURCES/bla-1.5.1.tar.bz2
+
+- it will remove the symlink and run svn rm on
+ SOURCES-bin/bla-1.6.0.tar.bz2::
+
+ $ cd ../ # package top dir
+ $ mgarepo ci
+
+- mgarepo will commit the new tarball on SOURCES-bin/ and then on the rest
+ of the working copy
+
+mgarepo sync would perform these steps too.
+
+importing a package
+-------------------
+
+ $ mgarepo putsrpm mypkg.src.rpm
+
+- mgarepo will open the src.rpm
+
+- will look for tarballs inside SOURCES/ and import them to
+ http://svn.mageia.org/svn/binrepo/cauldron/mypkg/current/SOURCES/
+
+- will move the tarballs out of SOURCES and import the remaining files to
+ http://svn.mageia.org/svn/packages/cauldron/mypkg/current/
+
+- will do whatever else putsrpm already does
+
+TODO
+=====
+
+First phase
+-----------
+
+- upload
+- markrelease
+- putsrpm
+- getsrpm
+
+
+Second phase
+------------
+
+- up
+- sync
+
+Rejected or postponed ideas
+===========================
+
+Use of a plain filesystem storage for the tarballs
+--------------------------------------------------
+
+This was planned, then rejected. It becomes too complicated when thinking
+about markrelease, and mapping SVN revisions in the main repository to
+binaries versions in the "tarballs storage", basically requiring
+implementing VCS-like features on top of filesystem. Would also require
+implementing another authentication and access scheme. The main feature
+would be ease of removing old binaries, which isn't much of a point because
+we don't know precisely what and when we want to remove, so may end up not
+removing much files anyway.
+
+Use of a plain unversioned filesystem storage for the tarballs
+--------------------------------------------------------------
+
+Different than the previous one, this would mean not relying at all on
+binary files history keeping. Structure could be something simple like::
+
+ packages/${pkg:0:1}/$pkg/$tarball
+
+This alternative does not suffice for Cooker, nor for supported distros, for
+which we want history. It could, however, at some point be used for "very
+old" distros, for which we may have lost interest in keeping *binaries*
+history (package history will kept "forever" in the main SVN repository).
+Alternatively, "resetting" an SVN binrepo (i.e. recreate the repository) to
+contain only the latest tarballs would probably take about the same amount
+of space, anyway...
+
+Open tarballs repository
+------------------------
+
+This idea is not really rejected. It does not go against splitting txtrepo
+and binrepo, but rather complement this idea, where the
+open-tarballs-repository would take the place of the binrepo. The txtrepo
+would still be used +- the same way. This repository could be used
+selectively, for packages where it makes sense, while most packages could be
+kept "closed", still as tarballs.
+
+Use of externals for more seamless Subversion usage
+---------------------------------------------------
+
+This idea is not discarded, but it just provides easiness. OTOH, it makes
+things more complicated:
+
+- markrelease: externals would have to be updated in order to make it point
+ to the tagged version in the binrepo, otherwise changes in
+ current@binrepo would change older releases;
+- branching whole distro: even though subversion now supports "relative
+ externals", we would have to update the URLs for *every* package on the
+ distro, as the path to reach the binrepo spans the local distribution
+ directory;
+- keeping externals up-to-date (as stated above and below)
+- authentication and access control: only markrelease action done by the
+ build system should be allowed to change externals (so what about importing
+ new packages?)
+- just a convenience, we don't need and shouldn't rely on externals for
+ running the build system, while most people will use the repositories via
+ Repsys, so why spend time to implement and keep it?
+- "svn co" works transparently, cool, but "svn co -r N" does not, otherwise
+ every change in the binrepo would require svn:externals to be updated in
+ the respective package;
+- it does not solve the problem of creating and handling symlinks between
+ SOURCES and SOURCES-bin.
+
+Keeping svn:externals updated for every package has almost the same cost of
+keeping the `sha1.lst` updated, with the difference that in the latter we
+would not have to update every package when creating distro branches.
+
+Use of "external" xdelta to save space on binaries
+--------------------------------------------------
+
+But how? First idea is this could be done by defining a protocol and
+assuming repository manipulation with mgarepo (for ease). Repsys could
+xdelta tarballs and add it to SVN with a special filename, then use it when
+checking out. Would require a policy/algorithm on when to ditch old whole
+binaries, too (i.e. hopefully wouldn't need to be handled manually by the
+maintainer). Also, this is something complemental to splitting the
+repository, so we may do it later, for binrepos.
+
+
+The Future
+==========
+
+- Open tarballs repositories
+
+ - suited for GIT, maybe multi-VCS
+ - incremental move
+ - not everything will be suited for this, must handle all cases or be
+ optional
+
+- Xdelta
+