This hierarchy +is the untouched repositories we want to merge. They can be found in +`lib/` in the `repos` dict. + +`fs.crawl()` takes the info from `` and generates the paths we +use to figure out what to download. Then `popDirs()` in `amprolla-init` +uses these paths to actually create the directory structure and download +the `Release` files from there. (NOTE: at a final point it should also +check the gpg signatures from the `InRelease` file) + +After the Release files are downloaded, they are parsed for their +contents and all the files they list are downloaded in series (possibly +should be parallel). + +So now we have all the files we need to create a merge. + + + +The merge will be done in the `merged/` directory. To do a successful +merge, these are the points that need to be accomplished: + + * we must skip packages that are in `` + * for each `binary-$arch` we must create a new `Packages` file, + containing our merge: + * we parse every package and fill in a dict + * first priority 0, then 1, then 2, etc... + * if a package already exists from a higher-priority repo - it + gets skipped + * (NOT SURE): if a package from `banpkgs` is in a package's + dependency list - that package gets skipped + * once we've finished the iteration, we dump a new Packages file + from the updated dict + + + +After the initial merge, we need to watch for updates. My idea is to +make amprolla pool the above-mentioned Release files as they contain +enough metadata for us to find out what changed. They also contain a +date entry so we can see if there was actually an update without digging +too deep. + +So if we figure out there was an update, we download the new file, parse +it into a dict and compare it to the old version of that file/dict. diff --git a/doc/dan-notes b/doc/dan-notes @@ -0,0 +1,109 @@ +Ok... so the debian repo is essentially a directory heirarchy... + +Ok.. Do you understand the repo heirarchy? ie the main folder (in +amprolla case /merged) with sub folders 'dist' (for repo metadata) and +'pool' (where the actual binary and source packages go)?? +forget about the "pool" folder, amprolla doesn't touch it... + +in "dists/" you have all the suites ie: jessie, ascii, ceres and all +the and stable, unstable and version symlinks. + +in the suite folder, you find the section folders: main contrib non-free +and files InRelease, Release and Release.gpg + +InRelease is just the pgp/smime version of the Release file - the gpg +sig is the same as Release.gpg + +Anyway the Release file basically is a dictionary of most of the files +in the subdirectory with size and checksums (SHA256, SHA512 etc) in what +is essentially RFC822 format, with a bunch of headers at the top that +specify details about the Release of that suite. + +In the suite subdirectories you have a bunch of folders, binary-<arch> +which contains the Packages file, and compressed copies of that, and a +Release Stanza, and similar for the source folder with Sources file and +compressed copies etc. + +the Contents files (currently not processed) are their too. +(They contain a list of all the files in each package) + +their is also the i8n - folder which contains the processed files. +oops s/processed files/translation files/ + + +Amprolla takes several mirrors and merges them in order of priority +starting with the highest priority. It firsts iterates over the structure +to create it's repo structure, ie dists/<suite>/<section>/ etc and then first +copies the highest priority mirror Packages and Sources files in and then for +the othermirrors iterates over the Packages and Sources files and compares +each package stanza for a match, and if there is a match on name then the highest +priority mirror version is kept, if not then the package is added in. +(This is where the inefficient model really shows up) + + +After all the new Source and Packages files are processed then the Release and +InRelease files are generated by walking the hierarchy and adding those files in. + +There is a lot of complexities, part of which is in the design of amprolla. +What I had started to do, and in describing it now, it seems obvious to me +I should probably have started pretty much from scratch is instead of this +iterative approach of compare and add or skip is keep a cache of each mirrors +last state, and then on each run create a delta between the last state and +current state. + + +* and how does dak integrate in all of this? +it doesn't. Dak is a standalone repository which just deals with the packages built by our CI +* so it's the same as any debian repo +Yup, slightly modified to handle our CI and some other tweaks +and I checked and our version is in gdo too. + + +anyway as I was saying about my approach re delta's: +There are big efficiencies in this approach. For starters, we only download the InRelease or +Release and Release.gpg file and after verifying it, compare to the previous state, and we +can use the delta generated to pick what files are new, changed or removed from the repo. +This means we only download the changed files in the repo for a start. And for the +Packages and Sources files we create a delta list of changed stanza's to apply. + +Instead of building the entire repo from scratch, we apply the delta +to a copy of our merged repo with handling for priority etc... + +What stumped me in the end is we actually should verify that we only have packages go in that +have a matching source stanza and we really need to process the contents and translations +at the same time. + +I suspect that nextime realised this which is why he started on amprolla2 which essentially +replicates dak + amprolla function... + +I just realised, I forgot to mention the overrides processing in amprolla. In the very +top of the dir in "merged/" is the "indices" folder that contains overrides. These +files specify for each Packages files, any metadata changes that need to be applied to +package stanza's + +In debian their is a entry for every single deb package/source in the archive making +them very large. We did away with that to reduce the overhead of processing it created. + +So we only have entries for those that need changing, usually to change priorities of +systemd packages and remove recommends and suggests for systemd related packages. + +* are indices a part of the repo or only needed by amprolla? +both. In debian, dak generates them and they are hand modified by the repo masters to +apply needed fixes. With amprolla, we only create them for applying our own changes as needed. +Technically they don't need to be in the repo, as they're not used by apt, but practically +it's good to have them there. + +hmmm, I think I've cracked my problem... +If I use the Sources delta to identify changed packages, I can use that to pick and apply +the changed Packages stanza's Contents and Translations. This would save lot's of +iterations, and I only need the delta Processing to be done on the Sources files. +Wow that would really speed things up + +The other benefit, is we can side load packages this way too and use it to replace dak +as well as either a standalone repo or directly into the merged repo. +And all without a hefty database. or the writeup + +your welcome. It has helped me probably as much as you. I think it's +turning into a full rewrite, but seems better design and possibly far easier to +write from scratch. +Anyway, it's nearly 3:30am here, so better get a couple hours sleep! diff --git a/doc/directories b/doc/directories @@ -0,0 +1,26 @@ +example aliases(?) + + +1.0 -> jessie +2.0 -> ascii +ascii +ascii-backports +ascii-proposed-updates +ascii-security +ascii-updates +ceres -> unstable +jessie +jessie-backports +jessie-proposed-updates +jessie-security +jessie-updates +sid -> unstable +stable -> jessie +stable-backports -> jessie-backports +stable-proposed-updates -> jessie-proposed-updates/ +stable-updates -> jessie-updates/ +testing -> ascii +testing-backports -> ascii-backports +testing-proposed-updates -> ascii-proposed-updates/ +testing-updates -> ascii-updates +unstable diff --git a/doc/file-formats b/doc/file-formats @@ -0,0 +1,96 @@ +example package: +---------------- + Package: apache2-mpm-event + Source: apache2 + Version: 2.4.10-10+deb8u8 + Installed-Size: 22 + Maintainer: Debian Apache Maintainers <> + Architecture: amd64 + Provides: httpd, httpd-cgi + Depends: apache2 (= 2.4.10-10+deb8u8) + Description: transitional event MPM package for apache2 + Homepage: + Description-md5: e8836e8c2c34524fb11cc83011803e4e + Section: httpd + Priority: optional + Filename: pool/updates/main/a/apache2/apache2-mpm-event_2.4.10-10+deb8u8_amd64.deb + Size: 1520 + MD5sum: e82aa67838c581d8217795dd6dbcb614 + SHA1: a0d59e70ea06d145af068f46649ba80a75bb75cf + SHA256: e756d82ab7111a9c76291c91e344bbbddc0d9945704ad42f1140af0594c74cad + +example source: +--------------- + + Package: apache2 + Binary: apache2, apache2-data, apache2-bin, apache2-mpm-worker, apache2-mpm-prefork, apache2-mpm-event, apache2-mpm-itk, apache2.2-bin, apache2.2-commo + n, libapache2-mod-proxy-html, libapache2-mod-macro, apache2-utils, apache2-suexec, apache2-suexec-pristine, apache2-suexec-custom, apache2-doc, apache2 + -dev, apache2-dbg + Version: 2.4.10-10+deb8u8 + Maintainer: Debian Apache Maintainers <> + Uploaders: Stefan Fritsch <>, Arno Töll <> + Build-Depends: debhelper (>= 9.20131213~), lsb-release, dpkg-dev (>= 1.16.1~), libaprutil1-dev (>= 1.5.0), libapr1-dev (>= 1.5.0), libpcre3-dev, zlib1g + -dev, libssl-dev (>= 0.9.8m), perl, liblua5.1-0-dev, libxml2-dev, autotools-dev, gawk | awk + Architecture: any all + Standards-Version: 3.9.6 + Format: 3.0 (quilt) + Files: + 4cc0006932cbdb7a2597691505f39424 3277 apache2_2.4.10-10+deb8u8.dsc + 44543dff14a4ebc1e9e2d86780507156 5031834 apache2_2.4.10.orig.tar.bz2 + 7d43a85707568321b98305fe61e386d5 555484 apache2_2.4.10-10+deb8u8.debian.tar.xz + Vcs-Browser: + Vcs-Git: git:// + Checksums-Sha1: + dd6e773c03c22eb97beffe56e39b9f4b17eea31e 3277 apache2_2.4.10-10+deb8u8.dsc + 00f5c3f8274139bd6160eda2cf514fa9b74549e5 5031834 apache2_2.4.10.orig.tar.bz2 + a789b374f989dfe3734cb9b1895e7d2891b5fd04 555484 apache2_2.4.10-10+deb8u8.debian.tar.xz + Checksums-Sha256: + c20dc666e6192c3db716e1dfb60afed3248aabd9a2d3232301a11fe8d936dac6 3277 apache2_2.4.10-10+deb8u8.dsc + 176c4dac1a745f07b7b91e7f4fd48f9c48049fa6f088efe758d61d9738669c6a 5031834 apache2_2.4.10.orig.tar.bz2 + 352be8c8245c162a9d97cf167a904fd1684904ffede565f23a654935701b40fa 555484 apache2_2.4.10-10+deb8u8.debian.tar.xz + Homepage: + Build-Conflicts: autoconf2.13 + Package-List: + apache2 deb httpd optional arch=any + apache2-bin deb httpd optional arch=any + apache2-data deb httpd optional arch=all + apache2-dbg deb debug extra arch=any + apache2-dev deb httpd optional arch=any + apache2-doc deb doc optional arch=all + apache2-mpm-event deb oldlibs extra arch=any + apache2-mpm-itk deb oldlibs extra arch=any + apache2-mpm-prefork deb oldlibs extra arch=any + apache2-mpm-worker deb oldlibs extra arch=any + apache2-suexec deb oldlibs extra arch=any + apache2-suexec-custom deb httpd extra arch=any + apache2-suexec-pristine deb httpd optional arch=any + apache2-utils deb httpd optional arch=any + apache2.2-bin deb oldlibs extra arch=any + apache2.2-common deb oldlibs extra arch=any + libapache2-mod-macro deb oldlibs extra arch=any + libapache2-mod-proxy-html deb oldlibs extra arch=any + Directory: pool/updates/main/a/apache2 + Priority: source + Section: httpd + +example translation: +-------------------- + + Package: libactivemq-java + Description-md5: b7875bda385f5f6b4e36597054392132 + Description-en: Java message broker core libraries + Apache ActiveMQ is a message broker built around Java Message Service (JMS) + API : allow sending messages between two or more clients in a loosely coupled, + reliable, and asynchronous way. + . + This message broker supports : + * JMS 1.1 and J2EE 1.4 with support for transient, persistent, transactional + and XA messaging + * Spring Framework, CXF and Axis integration + * pluggable transport protocols such as in-VM, TCP, SSL, NIO, UDP, multicast, + JGroups and JXTA + * persistence using JDBC along with journaling + * OpenWire (cross language wire protocol) and + Stomp (Streaming Text Orientated Messaging Protocol) protocols + . + This package contains a core Java library for ActiveMQ. diff --git a/lib/ b/lib/ diff --git a/lib/ b/lib/ @@ -0,0 +1,198 @@ +#!/usr/bin/env python +# copyright (c) 2017 - Ivan J. <> +# see LICENSE file for copyright and license details + +amprolla = { + "spooldir": "./spool", + "sign_key": "fa1b0274", + "mergedir": "./merged", + "mergedsubdirs": [ "dists", "pool"], + "banpkgs": [ 'systemd', 'systemd-sysv' ] + #"checksums": [ 'md5sum', 'sha1', 'sha256', 'sha512' ] +} + +repos = { + ## key name is priority, first is 0 + 0: { + "name": "DEVUAN", + "host": "", + "dists": "devuan/dists", + "pool": "devuan/pool", + "aliases": False, + "skipmissing": False + }, + 1: { + "name": "DEBIAN-SECURITY", + "host": "", + "dists": "dists", + "pool": "pool", + "aliases": True, + "skipmissing": True + }, + 2: { + "name": "DEBIAN", + #"host": "", + "host": "", + "dists": "debian/dists", + "pool": "debian/pool", + "aliases": True, + "skipmissing": False + } +} + +suites = { + 'jessie': [ + 'jessie', + 'jessie-backports', + 'jessie-proposed-updates', + 'jessie-security', + 'jessie-updates' + ], + 'ascii': [ + 'ascii', + 'ascii-backports', + 'ascii-proposed-updates', + 'ascii-security', + 'ascii-updates' + ], + 'unstable': [ + 'unstable' + ] +} + +aliases = { + "DEBIAN-SECURITY": { + 'ascii-security': 'testing/updates', + 'jessie-security': 'jessie/updates' + }, + "DEBIAN": { + 'ascii': 'testing', + 'ascii-backports': 'testing-backports', + 'ascii-proposed-updates': 'testing-proposed-updates', + 'ascii-updates': 'testing-updates' + } +} + +categories = [ 'main', 'contrib', 'non-free' ] + + +releases = { + "Release-jessie": { + "Suite": "stable", + "Codename": "jessie", + "Label": "Devuan", + "Version": "1.0", + "Description": "Devuan 1.0 Jessie (stable release)" + }, + "Release-ascii": { + "Suite": "testing", + "Codename": "ascii", + "Label": "Devuan", + "Version": "2.0", + "Description": "Devuan 2.0 Ascii (testing release)" + }, + "Release-unstable": { + "Suite": "unstable", + "Codename": "ceres", + "Label": "Devuan", + "Version": "x.x", + "Description": "Devuan x.x Ceres (unstable release)" + } +} + + +binaryarches = [ + 'all', + 'alpha', + 'amd64', + 'arm64', + 'armel', + 'armhf', + 'hppa', + 'hurd-i386', + 'i386', + 'ia64', + 'kfreebsd-amd64', + 'kfreebsd-i386', + 'mips', + 'mips64el', + 'mipsel', + 'powerpc', + 'ppc64el', + 's390x', + 'sparc' +] + +installerarches = [ + 'amd64', + 'arm64', + 'armel', + 'i386' +] + +mainrepofiles = [ + "InRelease", + "Release", + "Release.gpg" +] + +pkgfmt = [ + 'Package:', + 'Version:', + 'Essential:', + 'Installed-Size:', + 'Maintainer:', + 'Architecture:', + 'Replaces:', + 'Provides:', + 'Depends:', + 'Conflicts:', + 'Pre-Depends:', + 'Breaks:', + 'Homepage:', + 'Apport:', + 'Auto-Built-Package:', + 'Build-Ids', + 'Origin:', + 'Bugs:', + 'Built-Using:', + 'Enhances:', + 'Recommends:', + 'Description:', + 'Description-md5:', + 'Ghc-Package:', + 'Gstreamer-Decoders:', + 'Gstreamer-Elements:', + 'Gstreamer-Encoders:', + 'Gstreamer-Uri-Sinks:', + 'Gstreamer-Uri-Sources:', + 'Gstreamer-Version:', + 'Lua-Versions:', + 'Modaliases:', + 'Npp-Applications:', + 'Npp-Description:', + 'Npp-File:', + 'Npp-Mimetype:', + 'Npp-Name:', + 'Origin:', + 'Original-Maintainer:', + 'Original-Source-Maintainer:', + 'Package-Type:', + 'Postgresql-Version:', + 'Python-Version:', + 'Python-Versions:', + 'Ruby-Versions:', + 'Source:', + 'Suggests:', + 'Xul-Appid:', + 'Multi-Arch:', + 'Build-Essential:', + 'Tag:', + 'Section:', + 'Priority:', + 'Filename:', + 'Size:', + 'MD5sum:', + 'SHA1:', + 'SHA256:' +] diff --git a/lib/ b/lib/ @@ -0,0 +1,106 @@ +#!/usr/bin/env python +# copyright (c) 2017 - Ivan J. <> +# see LICENSE file for copyright and license details + +import ast +import gzip +import re +import requests +import time + +import config +from log import notice + +def getTime(date): + return time.mktime(time.strptime(date, "%a, %d %b %Y %H:%M:%S %Z")) + +def getDate(relfile): + match ='Date: .+', relfile) + if match: + line = relfile[match.start():match.end()] + relfile = line.split(': ')[1] + return relfile + + +def parseRel(reltext): + hash = {} + match ='SHA256:+', reltext) + if match: + line = reltext[match.start():-1] + for i in line.split('\n'): + if i == 'SHA256:' or i == '\n': # XXX: hack + continue + hash[(i.split()[2])] = i.split()[0] + return hash + + +def pkgParse(entry): + # for parsing a single package + values = re.split('\\n[A-Z].+?:', entry)[0:] + values[0] = values[0].split(':')[1] + keys = re.findall('\\n[A-Z].+?:', '\n'+entry) + both = zip(keys, values) + return {key.lstrip(): value for key, value in both} + + +def parsePkgs(pkgtext): + # this parses our package file into a hashmap + # key: package name, value: entire package paragraph as a hashmap + map = {} + + # TODO: consider also this approach + #def parsePkgs(pkgfilepath): + #with, "rb") as f: + # pkgs ="\n\n") + + pkgs = pkgtext.split("\n\n") + for pkg in pkgs: + m = re.match('Package: .+', pkg) + if m: + line = pkg[m.start():m.end()] + key = line.split(': ')[1] + map[key] = pkgParse(pkg) + return map + + +def printPkg(map, pkgname): + try: + pkg = ast.literal_eval(map[pkgname]) + sin = [] + for i in config.pkgfmt: + if config.pkgfmt[i] in pkg.keys(): + sin.append(config.pkgfmt[i] + pkg[config.pkgfmt[i]]) + return sin + except: + log.die("nonexistent package") + + +def dictCompare(d1, d2): + d1_keys = set(d1.keys()) + d2_keys = set(d2.keys()) + intersect_keys = d1_keys.intersection(d2_keys) + modified = {o : (d1[o], d2[o]) for o in intersect_keys if d1[o] != d2[o]} + return modified + + +def compareRel(oldrel, newrel): + r = requests.get(newrel) + new = r.text + with open(oldrel, "rb") as f: + old = + + oldtime = getTime(getDate(old)) + newtime = getTime(getDate(new)) + if newtime > oldtime: + notice("Update available") + newhashes = parseRel(new) + oldhashes = parseRel(old) + changes = dictCompare(newhashes, oldhashes) + # k = pkg name, v = sha256 + return changes + + +#relmap = compareRel("../spool/dists/jessie/updates/Release", "") +#print relmap +#for k,v in relmap.iteritems(): +# print(k) diff --git a/lib/ b/lib/ @@ -0,0 +1,33 @@ +#!/usr/bin/env python +# copyright (c) 2017 - Ivan J. <> +# see LICENSE file for copyright and license details + +import config + +def crawl(): + paths = {} + for i in range(0, len(config.repos)): + repo = config.repos[i]["name"] + basepath = config.repos[i]["dists"] + sts = [] + for j in config.suites: + for k in config.suites[j]: + if config.repos[i]["aliases"] == True: + if repo in config.aliases: + try: + suite = config.aliases[repo][k] + except: + if config.repos[i]["skipmissing"] == True: + continue + else: + suite = k + else: + suite = k + skips = [ "jessie-security", "ascii-security" ] ## XXX: HACK: + if repo == "DEBIAN" and suite in skips: + continue + sts.append(suite) + paths[repo] = sts + return paths + +#print(crawl()) diff --git a/lib/ b/lib/ @@ -0,0 +1,21 @@ +#!/usr/bin/env python +# copyright (c) 2017 - Ivan J. <> +# see LICENSE file for copyright and license details + +import sys + +def die(msg): + print("\033[1;31m[E] %s\033[0m" % msg) + sys.exit(1) + +def notice(msg): + print("\033[1;32m(*) %s\033[0m" % msg) + return + +def warn(msg): + print("\033[1;33m[W] %s\033[0m" % msg) + return + +def cleanexit(): + notice("exiting cleanly...") + sys.exit(0) diff --git a/lib/ b/lib/ @@ -0,0 +1,26 @@ +#!/usr/bin/env python +# copyright (c) 2017 - Ivan J. <> +# see LICENSE file for copyright and license details + +import requests + +import config +from log import die, notice, warn, cleanexit + + +def download(url, path): + print("\tdownloading: %s\n\tto: %s" % (url, path)) + r = requests.get(url, stream=True) + if r.status_code == 404: + warn("not found!") + return + elif r.status_code != 200: + die("fail!") + + with open(path, "wb") as f: + for chunk in r.iter_content(chunk_size=1024): # XXX: should be more on gbit servers + if chunk: + f.write(chunk) + #f.flush() + print("\033[1;32m . done\033[0m") + return