add readme; remove obsoleteness - amprolla

commit 1d9670ade4cc7c28dfd1c6de9bc14ca099be0c9d
parent 0454dba27c9b281b9eaca4b75184a8bc1f54cf15
Author: parazyd <parazyd@dyne.org>
Date:   Mon,  5 Jun 2017 21:47:59 +0200

add readme; remove obsoleteness

Diffstat:
A README.md  | 23 +++++++++++++++++++++++
D doc/dan-notes  | 109 -------------------------------------------------------------------------------
M orchestrate.py  | 6 ++----

3 files changed, 25 insertions(+), 113 deletions(-)
diff --git a/README.md b/README.md
@@ -0,0 +1,23 @@
+amprolla
+========
+
+amprolla is an apt repository merger originally intended for use with
+the [Devuan](https://devuan.org) infrastructure. This version is the
+third iteration of the software. The original version of amprolla was
+not performing well in terms of speed, and the second version was never
+finished - therefore this version has emerged.
+
+Dependencies
+------------
+
+### Devuan
+
+```
+gnupg2 python3-requests, python3-gnupg
+```
+
+### Gentoo:
+
+```
+app-crypt/gnupg dev-python/requests dev-python/python-gnupg
+```
diff --git a/doc/dan-notes b/doc/dan-notes
@@ -1,109 +0,0 @@
-Ok... so the debian repo is essentially a directory heirarchy...
-
-Ok.. Do you understand the repo heirarchy?  ie the main folder (in
-amprolla case /merged) with sub folders 'dist' (for repo metadata) and
-'pool' (where the actual binary and source packages go)??
-forget about the "pool" folder, amprolla doesn't touch it...
-
-in "dists/" you have all the suites ie: jessie, ascii, ceres and all
-the and stable, unstable  and version symlinks.
-
-in the suite folder, you find the section folders: main contrib non-free
-and files InRelease, Release and Release.gpg
-
-InRelease is just the pgp/smime version of the Release file - the gpg
-sig is the same as Release.gpg
-
-Anyway the Release file basically is a dictionary of most of the files
-in the subdirectory with size and checksums (SHA256, SHA512 etc) in what
-is essentially RFC822 format, with a bunch of headers at the top that
-specify details about the Release of that suite.
-
-In the suite subdirectories you have a bunch of folders, binary-<arch>
-which contains the Packages file, and compressed copies of that, and a
-Release Stanza, and similar for the source folder with Sources file and
-compressed copies etc.
-
-the Contents files (currently not processed) are their too.
-(They contain a list of all the files in each package)
-
-their is also the i8n - folder which contains the processed files.
-oops s/processed files/translation files/
-
-
-Amprolla takes several mirrors and merges them in order of priority
-starting with the highest priority.  It firsts iterates over the structure
-to create it's repo structure, ie dists/<suite>/<section>/ etc and then first
-copies the highest priority mirror Packages and Sources files in and then for
-the othermirrors iterates over the Packages and Sources files and compares
-each package stanza for a match, and if there is a match on name then the highest
-priority mirror version is kept, if not then the package is added in.
-(This is where the inefficient model really shows up)
-
-
-After all the new Source and Packages files are processed then the Release and
-InRelease files are generated by walking the hierarchy and adding those files in.
-
-There is a lot of complexities, part of which is in the design of amprolla.
-What I had started to do, and in describing it now, it seems obvious to me
-I should probably have started pretty much from scratch is instead of this
-iterative approach of compare and add or skip is keep a cache of each mirrors
-last state, and then on each run create a delta between the last state and
-current state.
-
-
-* and how does dak integrate in all of this?
-it doesn't.  Dak is a standalone repository which just deals with the packages built by our CI
-* so it's the same as any debian repo
-Yup, slightly modified to handle our CI and some other tweaks
-and I checked and our version is in gdo too.
-
-
-anyway as I was saying about my approach re delta's:
-There are big efficiencies in this approach.  For starters, we only download the InRelease or
-Release and Release.gpg file and after verifying it, compare to the previous state, and we
-can use the delta generated to pick what files are new, changed or removed from the repo.
-This means we only download the changed files in the repo for a start.  And for the
-Packages and Sources files we create a delta list of changed stanza's to apply.
-
-Instead of building the entire repo from scratch, we apply the delta
-to a copy of our merged repo with handling for priority etc...
-
-What stumped me in the end is we actually should verify that we only have packages go in that
-have a matching source stanza and we really need to process the contents and translations
-at the same time.
-
-I suspect that nextime realised this which is why he started on amprolla2 which essentially
-replicates dak + amprolla function...
-
-I just realised, I forgot to mention the overrides processing in amprolla.  In the very
-top of the dir in "merged/" is the "indices" folder that contains overrides.  These
-files specify for each Packages files, any metadata changes that need to be applied to
-package stanza's
-
-In debian their is a entry for every single deb package/source in the archive making
-them very large.  We did away with that to reduce the overhead of processing it created. 
-
-So we only have entries for those that need changing, usually to change priorities of
-systemd packages and remove recommends and suggests for systemd related packages.
-
-* are indices a part of the repo or only needed by amprolla?
-both.  In debian, dak generates them and they are hand modified by the repo masters to
-apply needed fixes.  With amprolla, we only create them for applying our own changes as needed.
-Technically they don't need to be in the repo, as they're not used by apt, but practically
-it's good to have them there.
-
-hmmm,  I think I've cracked my problem...
-If I use the Sources delta to identify changed packages, I can use that to pick and apply
-the changed Packages stanza's Contents and Translations.  This would save lot's of
-iterations, and I only need the delta Processing to be done on the Sources files.
-Wow that would really speed things up
-
-The other benefit, is we can side load packages this way too and use it to replace dak
-as well as either a standalone repo or directly into the merged repo.
-And all without a hefty database. or the writeup
-
-your welcome.  It has helped me probably as much as you.  I think it's
-turning into a full rewrite, but seems better design and possibly far easier to
-write from scratch.
-Anyway, it's nearly 3:30am here, so better get a couple hours sleep!
diff --git a/orchestrate.py b/orchestrate.py
@@ -2,7 +2,7 @@
 # see LICENSE file for copyright and license details
 
 """
-Module used to orchestrace the entire amprolla merge
+Module used to orchestrate the entire amprolla merge
 """
 
 from os.path import join
@@ -12,8 +12,6 @@ from lib.config import (arches, categories, suites, mergedir, mergesubdir,
                         pkgfiles, srcfiles, spooldir, repos)
 from lib.release import write_release
 
-# from pprint import pprint
-
 
 def do_merge():
     """
@@ -33,7 +31,7 @@ def do_merge():
 
     am = __import__('amprolla_merge')
 
-    p = Pool(4)
+    p = Pool(4)  # Set it to the number of CPUs you want to use
     p.map(am.main, pkg)

	amprolla devuan's apt repo merger
	git clone git://parazyd.org/amprolla.git
	Log \| Files \| Refs \| README \| LICENSE

A	README.md	\|	23	+++++++++++++++++++++++
D	doc/dan-notes	\|	109	-------------------------------------------------------------------------------
M	orchestrate.py	\|	6	++----