diff options
Diffstat (limited to 'BRANCH')
-rw-r--r-- | BRANCH | 419 |
1 files changed, 0 insertions, 419 deletions
@@ -1,419 +0,0 @@ -================================ -The detached binaries repository -================================ - -.. contents:: - -A brief description -=================== - -Ideally, all binaries from packages sources (ie. all the binary files inside -SOURCES/) will be placed in another subversion repository. This repository -is called "tarballs repository", "binaries repository" or just "binrepo". -It will contain mostly the same directory structure of the main repository, -but instead of having SOURCES and SPECS, it will only have a SOURCES -directory. Every copy/move operation should happen in both repositories. - -In order to allow deceasing binaries from older distributions, each stable -distro will have its own subversion repository for binary files. repsys -knows how to access these binrepos by checking which URL defined in the -"[binrepo]" section of the configuration file matches the path-part of the -repository being accessed. (see open issues) - -The package changelogs will be generated from SVN commit logs in the main -"plaintext" repository ("txtrepo" for short) only. Old changelogs will be -preserved, as even empty revisions are preserved in the binaries-filtering -conversion. - - -Mapping repositories states ---------------------------- - -In order to allow the use of `repsys {getsrpm,co} -r REV`, repsys will have -to use a reference in the text repo which will be used to know in what -state was the binrepo when a binary was uploaded. - -We cannot use direct revision number mapping through properties/files/etc -mainly because we may have multiple binaries repositories, and eventually -they can be filtered for reducing space, thus can't ensure revisions will -survive. Thus another mechanism which relies on dates instead of revisions -numbers is needed. - -When a binary is uploaded to the binrepo, the file `sha1.lst` is updated to -have the files's hash and commited in the main text repo. This file will be -used as the reference when the user uses -r REV on repsys. repsys will -checkout the package in the main text repo with -r REV and then will use -the "Last Changed Date" of `sha1.lst` to checkout the binrepo part. Thus, -`sha1.lst` should be always commited to the main text repository *after* the -corresponding binary files have been commited to the binrepo. Hooks in the -main repository may be used to try to enforce this, by checking if the files -changed in `sha1.lst` are already commited in the corresponding binrepo. - -Computation of `sha1.lst` is unlikely to be an issue: - -- it should not happen too often for any given package -- it takes[0] less than 10s to sha1sum all SOURCES of openoffice.org-3.1-1mdv2010.0.src.rpm -- it probably takes way less than the time to upload the file into the repository -- it can be computed in parallel to the binrepo commit, and probably finish - before that, thus ready by the time `sha1.lst` should be commited -- users don't need to verify the SHA1s "everytime", but the build system - does, thus Repsys can default to not verify and avoid wasting users' time - -The use of `sha1.lst` has the valuable property of tying the state of the main -repository and the binrepo. With it, at getsrpm time of a package -submission we can verify the SHA1 of the SOURCES-bin, and be sure that -either the package will be built with the expected state, or early fail the -build. It also allows for verifying binaries without trusting the binrepo, -which may be useful if we consider using an unversioned plain filesystem -storage in the future (for old distros or whatever), or at "client side", -which maintainers may find useful. - -[0]: In a single core AMD Athlon(tm) 3800+ (2400Mhz) - -Mapping of revisions using SVN properties ------------------------------------------ - -Alternatively to using the above "sha1.lst scheme", the revision mapping -between the main repository and a binrepo could be done using subversion -properties. This could be done by making every commit to binrepos also -cause a corresponding commit in the main text repository to happen, which -would update a property recording the current date. That is, a subversion -property in the main text repository would be kept, such that for any given -main repository revision, the corresponding state of the binrepos is -obtainable (using the registered date). - -This would be "more transparent", as it can be maintened simply by using -subversion hooks, without user intervention. OTOH, as every time the user -commits to a binrepo this would result in a commit in the main repository, -it would require the user to "svn up" the directories from there before -commiting, after every binrepo commit. Also, this might result in a big -number of "bogus" commits to the main repository, which could be seen as log -pollution, and may potentially increase space usage etc.. - -Why a new repository without the tarballs -========================================== - -- the current svn repository is too large, hard to manage -- big binary files (in general, "tarballs") history is of little value in - the distro development, we care much more about our specs, patches, - configurations, etc.; nonetheless, those big files we don't care much for - take the most resources and make backups and restoration in case of - failure very expensive, much more so than the more valuable data -- there is no easy way to strip undesired tarballs without recreating the - whole repository -- fedora and ubuntu have separated repositories, so we must have it too! - -Numbers -------- - -Current repository is +390000 revisions and ~340Gb big, while the bzip2ed -dumps backup for it takes about a bit more than half that size (FIXME: -estimative, can't check in the backup server right now). Current txtrepo -with the same number of revisions is ~180Gb big, takes about 2-3 days to be -imported, while the gzipped full dump backup for it currently takes ~1.2Gb. -Initial binrepo for Cooker (only `current/` packages' branches) took ~28Gb -in disk, gzipped full dump for it takes ~25Gb, took about 5h30m to be -populated from the current in use repository ("oldrepo"). - - -Drawbacks of this layout -========================= - -- (always) everything that changes the single-repository usage increases the chance - of failure and make things more complicated. -- subversion can't be used alone as easily as the current scheme allows -- copying binaries between distro branches may not be "svn-cheap" anymore - (unless they're in the same binrepo) -- ... - - -Open issues -============ - -Multiple binrepos dont allow us to have one permanent URL ---------------------------------------------------------- - -We would have to update the configuration files from all the users in order -to add a new stable repository. spuk suggests to use properties in the main -text repo that would point to the right repository locations. - -How to handle failures when operating on more repositores? ----------------------------------------------------------- - -binrepos should replicate the structure of the main text repo. What we -should do if the markrelease succeeds in the binrepo, but fails in the main -text repo? - -R: Markrelease must be done first in the txtrepo. If it fails there "we're -in trouble" (though currently, we just miss it[0]). When the markrelease is -done in the txtrepo, we can do markrelease in the binrepo using '-r {DATE}', -using the markrelease date in the txtrepo as '{DATE}'. - -[0] We should add transaction support for markrelease. The transaction could -be stored out of the packages SVN (another SVN, a DB, a txt file, etc.), and -would work like: - -0. mark beginning of markrelease, early failing the package build if it fails -1. do markrelease -2. mark sucessful end of markrelease - or mark failed markrelease, so we can replay it later - - -Interesting use cases (first phase) -=================================== - -repsys co 2008.1/mutt ---------------------- - -- repsys checkouts - http://svn.mandriva.com/svn/packages/updates/2008.1/mutt/current to the - mutt directory - -- repsys checkouts - http://svn.mandriva.com/svn/binrepo/updates/2008.1/mutt/current/SOURCES - into mutt/SOURCES-bin - -- creates symlinks for all files found in SOURCES-bin/ into ../SOURCES/ - - (rpm doesn't handle symlinks, this allows us to have explicit links and - proper src.rpm generates by rpmbuild) - -In case the path doesn't exist in the binrepo it will not fail, as we may -have not imported all packages or the repository is not prepared to work on -this model, etc. - -markrelease of a package ------------------------- - -:: - - $ repsys markrelease - -- will copy current/ to releases/VERSION/RELEASE, as usual - -- will copy current/ to releases/, on the binrepo too - -Optionally, markrelease could create revprops indicating which is the -revision of current/ on the binrepo that represents the tarballs that are -being tagged. - - -Use cases to be implemented after the first phase -================================================= - -upgrading to a newer version of the package -------------------------------------------- - -:: - - $ cd bla/SOURCES/ - $ wget http://prdownloads.sourceforge.net/bla/bla-1.6.tar.bz2 - $ repsys add bla-1.6.0.tar.bz2 - -- repsys notices this is a tarball (checking filename and/or file size) - -- repsys will move the file to SOURCES-bin/, create the symlink, and svn-add - it to the working copy - - $ # the user updates the spec - - $ repsys rm SOURCES/bla-1.5.1.tar.bz2 - -- it will remove the symlink and run svn rm on - SOURCES-bin/bla-1.6.0.tar.bz2:: - - $ cd ../ # package top dir - $ repsys ci - -- repsys will commit the new tarball on SOURCES-bin/ and then on the rest - of the working copy - -repsys sync would perform these steps too. - -importing a package -------------------- - - $ repsys putsrpm mypkg.src.rpm - -- repsys will open the src.rpm - -- will look for tarballs inside SOURCES/ and import them to - http://svn.mandriva.com/svn/binrepo/cooker/mypkg/current/SOURCES/ - -- will move the tarballs out of SOURCES and import the remaining files to - http://svn.mandriva.com/svn/packages/cooker/mypkg/current/ - -- will do whatever else putsrpm already does - -TODO -===== - -First phase ------------ - -- upload -- markrelease -- putsrpm -- getsrpm - - -Second phase ------------- - -- up -- sync - -Rejected or postponed ideas -=========================== - -Use of a plain filesystem storage for the tarballs --------------------------------------------------- - -This was planned, then rejected. It becomes too complicated when thinking -about markrelease, and mapping SVN revisions in the main repository to -binaries versions in the "tarballs storage", basically requiring -implementing VCS-like features on top of filesystem. Would also require -implementing another authentication and access scheme. The main feature -would be ease of removing old binaries, which isn't much of a point because -we don't know precisely what and when we want to remove, so may end up not -removing much files anyway. - -Use of a plain unversioned filesystem storage for the tarballs --------------------------------------------------------------- - -Different than the previous one, this would mean not relying at all on -binary files history keeping. Structure could be something simple like:: - - packages/${pkg:0:1}/$pkg/$tarball - -This alternative does not suffice for Cooker, nor for supported distros, for -which we want history. It could, however, at some point be used for "very -old" distros, for which we may have lost interest in keeping *binaries* -history (package history will kept "forever" in the main SVN repository). -Alternatively, "resetting" an SVN binrepo (i.e. recreate the repository) to -contain only the latest tarballs would probably take about the same amount -of space, anyway... - -Open tarballs repository ------------------------- - -This idea is not really rejected. It does not go against splitting txtrepo -and binrepo, but rather complement this idea, where the -open-tarballs-repository would take the place of the binrepo. The txtrepo -would still be used +- the same way. This repository could be used -selectively, for packages where it makes sense, while most packages could be -kept "closed", still as tarballs. - -Use of externals for more seamless Subversion usage ---------------------------------------------------- - -This idea is not discarded, but it just provides easiness. OTOH, it makes -things more complicated: - -- markrelease: externals would have to be updated in order to make it point - to the tagged version in the binrepo, otherwise changes in - current@binrepo would change older releases; -- branching whole distro: even though subversion now supports "relative - externals", we would have to update the URLs for *every* package on the - distro, as the path to reach the binrepo spans the local distribution - directory; -- keeping externals up-to-date (as stated above and below) -- authentication and access control: only markrelease action done by the - build system should be allowed to change externals (so what about importing - new packages?) -- just a convenience, we don't need and shouldn't rely on externals for - running the build system, while most people will use the repositories via - Repsys, so why spend time to implement and keep it? -- "svn co" works transparently, cool, but "svn co -r N" does not, otherwise - every change in the binrepo would require svn:externals to be updated in - the respective package; -- it does not solve the problem of creating and handling symlinks between - SOURCES and SOURCES-bin. - -Keeping svn:externals updated for every package has almost the same cost of -keeping the `sha1.lst` updated, with the difference that in the latter we -would not have to update every package when creating distro branches. - -Use of "external" xdelta to save space on binaries --------------------------------------------------- - -But how? First idea is this could be done by defining a protocol and -assuming repository manipulation with repsys (for ease). Repsys could -xdelta tarballs and add it to SVN with a special filename, then use it when -checking out. Would require a policy/algorithm on when to ditch old whole -binaries, too (i.e. hopefully wouldn't need to be handled manually by the -maintainer). Also, this is something complemental to splitting the -repository, so we may do it later, for binrepos. - - -The Future -========== - -- Open tarballs repositories - - - suited for GIT, maybe multi-VCS - - incremental move - - not everything will be suited for this, must handle all cases or be - optional - -- Xdelta - - -Deployment -========== - -The current repository will be kept around for a while, in readonly state. -Initial binrepos will be populated with the binaries in the `current/` -branches of packages. - -The binrepo mappings config might be kept in a fixed subversion revision -property (revision 0?). - -Rough steps ------------ - -- check for agreement between subversion repository filters for binaries, - and repsys -- upgrade repsys everywhere - - - kenobi - - cluster nodes - - raoh - - titan - -- populate the binrepos for each supported distro, from a specific revision - of oldrepo, and mass commmit the corresponding `sha1.lst` in txtrepo for - every package - - - set svn:date revprop of the `sha1.lst` mass commit to the date of the - oldrepo revision - - before mass commiting the `sha1.lst`, possibly freeze oldrepo, check - for changes to sources after the selected revision, and update the - binrepo as necessary - -- check Secteam scripts, make needed changes to get them ready (non - critical) -- set up the new repositories - - - hook for filtering of disallowed (binary) files in main repository - - binrepos mappings - -- make the new main + binrepos repositories available, but readonly - - - keep new main repository in sync with the old repository with hooks - -- make current repository readonly and enable verification of sha1.lst at - package submission time - -- make sure new main repository and old repository are in sync - - - resync binrepos with the old repository as needed - -- final tests - - - change something - - submit - - etc. - -- make the new repositories writeable - |