Web logs of McSinyx

Lazy Ragù

Thu, 10 Oct 2024 00:00:00 +0000

Lazy Ragù

Craving 'em red meat sauces but too busy? Behold, a ragù recipe so well optimized it can boil the blood of every pizzaliano within a 35-centimetro radius.^[1]

Ingredients

I like to think my sauce is somewhere between Bologna and Napoli even tho it sure ain't. Imma list in the order of importance what I usually use down here, but please adjust according to your local market and preference, to quote chef Jean-Pierre:

If you don't have/like it, don't use it.

2 fist-sized onyi
1 big carrot
A Gordon-Ramsey dash of cooking oil^[2]
600 grammi of mince from any mammal^[3]
3 tomatoes^[4]
Salt and pepper
Half a bottle of drinkable red wine^[5]
'bout the same amount of stock
A few cloves of garlic
A couple of ribs of celery
Some thyme and basil, preferably fresh
Some chili
Paprika powder^[6]

These should produce about 2 litri of sauce or 4–6 servings,^[7] for which some hard cheese(s) and butter are also nice to have.

The following tools are also needed:

Bowls (for storing the ingredients)
Kitchen sink
Knife and cutting board
Fine grater (like for cheese or citrus jest)
Blender
Frying pan
Spatula or ladle
Sauce pan or pot
Slow cooker

Base Prepping

Peel, wash and finely dice the onyi and caramelize 'em in a pan with some oil under low heat. Stir sparingly, this process takes over half an hour and should start before anything else.^[8] A wee of salt would help draw out the moisture and accelerate the caramelization, which happens well above the water's boiling point.

In the meantime, wash the other vegetables, finely grate the carrot, peel and mince the garlic, and dice the celery. When all done, transfer all to the pan and continue frying until the onyi are soft and lightly browned.

Broth Prepping

While waiting for the base veggies to caramelize, reduce the wine in half in a sauce pan to get rid of the alcohol.

Blend the tomatoes with the stock (it's supposed to be a purée without the peel but ain't nobody got time for that) and the rest of the spices. Be conservative with the salt, you can always add more but it's much less easy to remove.

Meat Prepping

Move the vegetables into the slow cooker from the pan and use it^[9] to sear the minced meat under medium heat until the bottom side is golden brown (no need to stir). Parfry in multiple batches if necessary: if the pan is crowded it'd take much longer to reach the desired temperature for the Maillard reaction. Remember, water is the enemy, so leave it a way to retreat.^[10]

Cooking

Scoop the meat into the slow cooker and pour in the reduced wine. Gradually add the tomato smoothie while mixing until the liquids barely covers the solids (add more stock if necessary).

Turn the cooker on low and cook for 4–8 hours or until the meat is tender.

Serving

Serve with short pasta or rice. Grate in a generous amount of hard cheeses and drop in a smol slab of butter^[11] and mix well for extra creaminess. Butter is an emulsion, so turn off the stove before adding it to prevent the butterfat from separating.

Plate with fresh basil and thyme and even moar grated cheese if you have any left.

Reflection

The recipe is not that lazy to be handy, it nor differs from a normal ragù enough to be a rage bait, but I spent all that time typing it down so I decided to keep the original title for the clickbait values.

[1]	I'm sorry, Steffo, but thou can't stop me.

[2]	Vegetable oil, animal fat or even butter, smoke point doesn't matter.

[3]	Yes, any mammal.

[4]	If not ripe and soft, add a few spoons of tomato paste.

[5]	Measure carefully!

[6]	Mostly for the color.

[7]	It'll last 17 years in the freezer, so just make a full pot.

[8]	Onyo is always number first!

[9]	The pan, not the slow cooker.

[10]

圍師必闕。

[11]	Butter makes everything butter.

Reply via email

GNU as a Router, the Canonical Way

Sat, 03 Aug 2024 00:00:00 +0000

GNU as a Router, the Canonical Way

A while ago I noticed that my ISP leases IPv4 addresses out indefinitely. It was everything I'd ever wanted and I gotta seize it to truly self-host. As an experiment, I started on something cheaper, like a single-board compooter. In 2024, support for general-purpose RISC-V chips began to ripen, so naturally due to FOMO, I bought a board with JH-7110. Boy, was that a mistake! While the bootloaders' support had been well upstreamed, certain essential features like PCIe (for NVMe) has yet to reached a mainline Linux release, even worse so on the BSDs. I ended up flashing the only distribution with official support at the time, Ubuntu.

Funny enough, after over a decade of daily driving GNU, twas the first time I installed Ubuntu on a machine of my own. At the time of writing, the reason for the was more apparent than ever: Canonical had been forcing Snap^[1] down the users' throat, even on the server edition. Thankfully Snap was still managed by APT and twas easy enough to remove prevent it from coming back. Another annoyance was the lack of manual pages in the minimized installation and that the official way to enable them is through a script that also install other bloats SMFH (the script is quite short and the actually necessary commands can be trivially found, I'd rather they're documented instead).

That being said, not everything Ubuntu includes due to NIH is bad. Unity (not the game engine that's proprietary like Snap server) was loved by many; and this article is basically an appreciation post for some others: Netplan and ufw. Before diving in, lemme finish the story to give you the full context of this setup. The SBC is the VisionFive 2 which is blessed with plenty of IO:

8 GB of memory
4 USB 3.0 type-A ports
2 RJ45 ports (1 Gb and 100/10 Mb)
1 M.2 slot (I used this as an excuse to buy a larger SSD and put the old 256 GB one here)
1 eMMC slot^[2] (eMMC are cheap, got one also with 256 GB)
1 TF slot
40 pin GP(and predefined-purpose)IO
Other stuff for interfacing with humen like HDMI, audio jack, etc.

Initially, my plan for the SBC was to host services unlisted on the loang network. Official services were not considered because my home network has no IPv6 and sometimes I'll like to have most of the bandwidth for meself. Shortly afterwards, I also purchased a somewhat beefy desktop compooter with even more I/O, especially a bunch of SATA, which are a lot more attractive than connecting hard di*ks via USB. On the other hand, the SBC barely consume any electricity, well under 10 W with the NVMe drive, a Wi-Fi dongle and a fan connected. Since it cost virtually nothing to keep it up 24/7, I decided to hand it the following two tasks:

Reverse proxying services running on more powerful machines in the local network.
Acting as a virtual router between nodes I manage. This is particularly useful for tunneling to my work network and accessing the servers, allowing me to work remotely with low latency.

Setting up the VPN with Wireguard was relatively easy, so I assumed swapping the SBC for the home router couldn't be too hard. Once again, I chose poorly, this little project'd costed me so many sleepless nights so I figured I should note down what I learned here in case it can save someone else from the same pain. Do not take inspiration from this!

Connecting to the Internet
Local Networking
Wireless Access Point
Name Resolution

Connecting to the Internet

My landlord handles the contract with the ISP so I don't know the details of the subscription, but there's certainly no IPv6 nor any static IPv4 address. Bandwidth to datacenters in the region is approximately 100 Mb/s and the wall socket connects to a Cat 5e cable. I know about the latter because whatever dumb ass did the last maintenance wired that to another short one dangling from the wall socket^[3], and after getting stabbed in the eyes for months I finally to open it up and made the socket a proper socket.

It would not make the slightest of a difference but I connect the SBC's 1 Gb port (identified in Ubuntu as end0) to the Internet and the slower one (end1) to my desktop on the local network. Thankfully no special setup was needed and here is the entire Netplan configuration to connect to the outside world:

network:
  ethernets:
    end0:
      dhcp4: true
  renderer: networkd
  version: 2

Local Networking

For simplicity's sake, I decided to use the same subnet for both Ethernet and Wi-Fi under a bridge br0, where addressing and routing is configured:

network:
  bridges:
    br0:
      addresses:
        - 192.168.147.254/25
      interfaces:
        - end1
      routes:
        - from: 192.168.147.128/25
          on-link: true
          to: 0.0.0.0/0
          type: nat
          via: 192.168.147.254
  ethernets:
    end1:
      dhcp4: false

As Netplan doesn't configure any DHCP server, that's done separately by udhcpd from busybox:

interface br0
start 192.168.147.128
end 192.168.147.253
max_leases 126
option subnet 255.255.255.128
option router 192.168.147.254

I couldn't seem to get a concrete information on the ports used by DHCP so I open the firewall for UDP on both 67 and 68 (I swear this isn't an engagement bait to test out the new mailing list):

ufw allow in on br0 to any port 67 proto udp
ufw allow in on br0 to any port 68 proto udp

Wireless Access Point

Thanks to systemd, the Wi-Fi dongle is recognized as wlx600dd0g8b33f. Yes, that abomination of a name includes the chip's full MAC address. That being said, I'd like to stick to the basis of a systemd/Linux distro. Netplan doesn't support Wi-Fi hotspot with systemd-networkd but NetworkManager, so the interface had thus to be declared as Ethernet:

network:
  bridges:
    br0:
      interfaces:
        - wlx600dd0g8b33f
  ethernets:
    wlx600dd0g8b33f:
      dhcp4: false

Actual wireless connectivity is handled by hostapd:

interface=wlx600dd0g8b33f
bridge=br0ssid=YΦ
utf8_ssid=1
country_code=KR
channel=6
ieee80211d=1
ieee80211h=1
ieee80211n=1
hw_mode=g
wmm_enabled=1wpa=2
wpa_pairwise=TKIP
wpa_passphrase=just enter random characters

Name Resolution

My ISP is known to be evil so I'd rather rely on more reputable resolvers like OpenNIC, which also offers free-of-charge (!) domain names. Most of their tier 2 servers are located on the other side of the globe (200 to 300 ms RTT), so a local cache is almost required. SmartDNS seems to be the best fit for this purpose, as it queries upstream servers simultaneously and also check for the IP with the lowest RTT among the results. Since I don't trust my ISP, connections to the upstream servers are encrypted:

bind :53@br0
server-tls 51.254.162.59 -host-name ns1-dot.iriseden.fr
server-tls 202.61.197.122 -host-name dns.furrydns.de
server-tls 80.152.203.134 -host-name dot.kekew.info
server-tls 178.201.248.159 -host-name dot.kekew.info
server-tls 178.201.248.160 -host-name dot.kekew.info
server-tls 95.216.99.249 -host-name dns.froth.zone

For the router itself, the nameserver is set in /etc/resolv.conf and Netplan is told not to change it:

network:
  ethernets:
    end0:
      dhcp4-use-dns: false

After ufw is configured to allow UDP traffic in port 53 on br0, udhcpd is instructed to advertise this local DNS server:

option dns 192.168.147.254

I might consider blocking ads at the domain-name level someday, but for now uBlock Origin is working well enough on my systems and I rarely have people over, especially not for looking at their electronic devices.

[1]	Not the good one.

[2]	Innovation's gone full circle, eMMC is short for embedded MMC.

[3]	Basically a futanari of the RJ45 world.

Reply via email

Best Ways to Watch YouTube Videos

Wed, 17 Jan 2024 00:00:00 +0000

Best Ways to Watch YouTube Videos

In today's episode of guides nobody asked for and likely having been covered by someone more qualified, lemme show you the correct ways to view videos hosted on YouTube and other hostile, tracker-riddled hellscapes. Whilst I despise Google's mass surveillance practices, it stores a large proportion of culturally significant videos and clips that would be difficult to mirror to user-respecting services due to copyright. Hell, even YouTube doesn't have the right to distribute many of them in the first place.

Because of YouTube's circumvention of advertisement blockers, the ad-blocking arm race finally caught mainstream media attention and tis kool to talk about that now. Hence I'm happy to jump on the bandwagon, albeit a wee bit late, but this ain't just that. Since I feed you poison—over 4% of the pages linked from my site are on YouTube—the least I can do is sell you my cures.

Using a Proper Media Player

The most popular solutions are either to use for a good blocker on a browser with (supposedly) long-term support for Manifest V2 like uBlock Origin on Firefox, or use alternative front-ends such as Invidious or Piped. Although uBlock Origin is essential for a pleasant experience on the modern interwebs and alternative frontends offers the best UX for browsing videos, in-browser and service-specific media players are inferior anyway when compared to programs properly designed for a decent playback experience.

My favorite has been mpv for as long as I can remember, as it makes it easy to adjust video brightness/contrast/etc., playback speed, subtitle size and placements, and to overamplify quiet audios. Out of the box, it integrates with yt-dlp, a time shifter with support for most online media services. Just drop the URL into an mpv window and boom, it werks!

Either drag-and-drop or invoking mpv $url is quite convenient, but not that close to following an anchor, is it? You'd need to first open mpv or a program launcher^[1], then drag the URL there, or perhaps copy and paste it for the latter cases. What if you gotta go fast, aye? As a hedgehog-maxxer meself, of course I can do better, and here's how.

With a Browser Add-on

While drafting this article, I noticed that the ff2mpv extension I was using had technically been non-free for a while. Albeit I understand and respect the author's noble intention against violence, I believe discrimination never ends up helping those oppressed due to the power imbalance for the exclusion false-positives to be worth it.

For this reason, I switched to Open in mpv and recommend it instead. The usage is practically the same: open context menu at the video URL and select Open this link in mpv. The internal mechanism is a bit different though, and because it influences the installation process, I will try to briefly explain how it works.

The way Open in mpv works is a bit convoluted. First, it wraps the specified URL in a mpv scheme. The new URL starts with mpv:// is then passed back to Firefox, which must have been configured to open it in the native program open-in-mpv. This program parses the URL into the equivalent mpv command and execute it. If you are not on NixOS, see the extension's README to set it up yourself.

Otherwise, it can be declared in configuration.nix(5) as follows. The declarations should be self-explanatory after referencing Firefox's documentation for policies.json. If you have trouble finding an extension's ID and download URL, search for it in Mozzarella.

{ pkgs, ... }:
{
  programs.firefox = {
    enable = true;
    policies = {
      ExtensionSettings."{d66c8515-1e0d-408f-82ee-2682f2362726}" = {
        default_area = "menupanel";
        installation_mode = "normal_installed";
        install_url =
          "https://addons.mozilla.org/firefox"
          + "/downloads/latest/iina-open-in-mpv/latest.xpi";
      };
      Handlers.scheme.mpv = {
        action = "useHelperApp";
        ask = false;
        handlers = [ {
          name = "open-in-mpv";
          path = "${pkgs.open-in-mpv}/bin/open-in-mpv";
        } ];
      };
    };
  };
}

Even though Mozzarella is supposed to only show libre add-ons, be aware that the metadata it crawls from addons.mozzila.org might not always be correct. Ideally, browser extensions should be packaged in the distribution's repository, but packaging discipline is not exactly NixOS's strong suit. I will probably post an update on how to declare policies.json in Guix once I figure that out.

From a Feed Reader

Now we can properly watch videos while browsing the web, but subscribing to YouTube channels on its web interface would require creating an account and subjecting one's self to more surveillance. Fortunately, at the time of writing, YouTube still provide Atom feeds for syndication. Funny enough, they are advertised on the channel pages as RSS:

The referenced feed employ Media RSS to communicate the video URL. This extension is widely supported by feed readers, as well as the previously mentioned feed-discovery mechanism. I use Liferea, which allows me to directly paste the YouTube channel's URL^[2], and displays each video's description, thumbnail and enclosed media, e.g.

For each MIME type to, enclosures can be configured to be opened by a user-preferred program. In this case, I set mpv --ytdl-format=b for application/x-shockwave-flash (a reminiscence of a time when browsers needed Flash to play videos and animations) for the second best quality to save some bandwidth. YouTube encodes the highest resolution video separate from the audio, so the best combined format b is one level lower than yt-dlp's default best video and best audio together.

Via Clipboard Integration

People also share videos with me via instant messaging. I find it cumbersome to open the URL in the browser then redirect it to the media player, so the clipboard is used as the bridge instead. To do this, I simply create a key binding to the command below.^[3]

mpv --ytdl-format=b "$(xclip -out -selection clipboard)"

Musing

There, I shared how I do it so you can too! If they seem needlessly complex, you share my disappointment on the UX evolution of the mainstream web. I dream of a more semantic web, not necessarily web 3.0, perhaps just more explicitly typed, where e.g. a YouTube URL for embedding would be a video/webm instead of a text/html.

If mailto URIs can launch our email client, and social media pages can bug us to open the post in their own app, why can't we have interoperable media handling? Maybe we should, but I'm not sure if we can. Greed stands in our way. Providers force us to use their proprietary malware to consume their service. DRM has become the foundation of media distribution. Grassroots movements like Framasoft might never reach mainstream status.

I don't mean to tell you to give up though, just to direct your energy to where it matters. Spend less on developing alternative front-ends than on ethical replacements, bridges and inviting people over. We need more videos, more music, more podcasts, more knowledge, better instant messaging, better search engines, better translations, better home automation, and whatnot. Against all odds, maybe things will finally start to improve even for those outside of our bubble. Perchance.

[1]	Or a terminal emulator

[2]	Something starting with https://www.youtube.com/@

[3]	On Wayland, replace `xclip` with something equivalent

Reply via email

Slow Cooked Pork and Eggs

Sat, 03 Jun 2023 00:00:00 +0000

Slow Cooked Pork and Eggs

Thịt kho tàu, literally Chinese braised pork, is one of the most common Vietnamese dish, to be found anywhere from military camps^[1] to fancy restaurants, anytime from family diners to new year holidays. While originated from southeastern China, over the years it adopted local ingredients such as fish sauce and coconut flesh and probably does not taste the same.

Due to time constraints, home cooks usually relies on fattier cuts such as the belly to maintain the juiciness. The downside is that the excess fat can quickly tell the liver to tune down the appetite after a few meals.^[2] This put me in an awkward position, since I was conditioned to feel wrong about braising a serving size of anything (I was living alone when typing this).

Though, as said three sentences ago, leaner cuts can be as tender when cooked longer. This is where slow cookers come to the rescue: they maintain temperature between 80 and 100°C and after eight hours even the toughest cuts will just fall apart. The best part? No supervision needed. Water doesn't even boil at that temperature, so accidentally burning food is never a worry.

For the ease of maintenance, I'd recommend slow cookers whose pot and lid can be taken out for cleaning. The pot should also be relatively large (3L^[3] or more) if you want to make other vegetable-rich stews.

Ingredients

As a big fan of Chef Jean-Pierre, I eyeball the amount of pretty much all ingredients here. The amount of pork and eggs should be enough to at least fill the bottom of the pot. I prefer quail eggs for their bite size and leaner cuts of pork but with some intramuscular fat and tendons. Hocks, hams and shoulders are all good and cheap candidates. Leave the skin on, the gelatine helps thicken the sauce. I like equal amount of eggs and meat.

For seasoning, you'll need fish sauce, sugar, whole black pepper, and optionally shallot, garlic and hard coconut meat.

Preparation

Boil the eggs and peel them. Layer them in the pot. Peel and slice one or two cloves of garlic and sprinkle them in there. If you have coconut meat, julienne^[4] and throw it in as well.

Cut the pork into bite-size dice. Place the skin facing up or the side of the pot. You want (some of it) to be drier for texture variety. Peel a few cloves of shallot and embed them between the dice of pork.

Cooking

Pour a very thin layer of sugar on a sauce pan and heat it up at medium low to make some dark caramel (too low you'll just get liquid sugar and too high you'll burn it faster than the blue hedgehog). Soon as it's bubbling, carefully pour in some water. The amount should be able to almost cover the meat and eggs in the pot.

While waiting the caramel to dissolve, add fish sauce to taste, and throw in a generous number of peppercorns. Transfer the sauce to the pot, making sure the eggs are fully covered (they can be really chewy when dry: another reason to favor the quail ones).

Turn the slow cooker on low and cook for around eight hours. Tastes amazing either hot or cold, best served with boiled or pickled vegetables and any kind of starch, commonly rice or sweet potatoes but you can try bread, potatoes, or even short pasta if you're feeling adventurous.

[1]	My ole frens from USTH absolutely dug it during military training!

[2]	Okay, maybe I lied about the digging part.

[3]	Or 0.12245589 diesel tank in freedom units.

[4]	Anywhere between the size of a matchstick and a chopstick is good.

Reply via email

XML and Photo Gallery Generation: A Love Story

Fri, 17 Mar 2023 00:00:00 +0000

XML and Photo Gallery Generation: A Love Story

I'm just a language, whose style sheets are good
Oh, Lord, please, don't let me be misunderstood

Tips

As usual, the article starts with a text wall of random rambling. If you are only interested in the technical aspects, feel free to skip the first two sections.

Introduction
Motivation
Preliminary
Approach
Implementation
1. Page Generation
2. Feed Generation
3. Thumbnail Generation
Discussion
Conclusion

Introduction

Neural-optic live streaming probably, no, definitely offers the most photorealistic graphics one can set eyes on. CGI is just a pathetic mimic, and photography or videography is no more than a poor plagiarism attempt when compared to quantum ray-tracing and other advanced physics simulations^W happenings.

On the other hand, we humen are rather shite at replaying visual memories, whilst (bit rot aside) media can be archived for forever. Besides, many of us are too busy to touch grass or go see cool things as regularly as we wish to. This is how an industry based on showing us mundane stuff or obvious bullcrap can still manage to make tens of thousands of craploads each year any why the interwebs are flooded with pictures of cats, kitties and pussies.

Finding new shits means dopamine dispensation and that's why they are dope. As a model netizen, I adhere to the web's social contract of mutual shitposting so that everyone can have a piece. Every blue moon, I also enjoy posting more quality stuff like what you are reading right now, should you ignore the number of Mozart references in the last three paragraphs.

Motivation

Some other times, I also want to share the living things and sceneries I encounter in the new place. My camera was gifted by father before I moved and yet I shared more photos with strangers than with my family. The PixelFed instance I landed on irreversibly shrank and lossily compressed them, while dumping 5 MB images to the family chat room just feels weird, hence I decided to gather the decency to build a photo gallery to show my loved ones (and admittedly, flex with online strangers).

There are not many CMS in the wild for photo hosting, and they often acts as a wall garden and/or a social network. Building and hosting a new one is quite overkill, thus the obvious solution left would be generating a static site. Out of the gazillion SSG, I couldn't found any that meets the my requirements:

Generate a web feed
Automate filling image title and alt text
Offer fine-grain control for permanent pagination
Generate thumbnails with custom size and name

I mean, they perhaps exist, but the number I had to try and fight through would cost more time than writing the web pages and feed by hand. So I wrote them from scratch. Y'all can stand up and clap now!

Preliminary

Yes, I really started with writing XHTML and Atom by hand. A web page has the following structure with namespaces omitted and denoted in WXML (Wisp$\times$SXML) so I don't have to close the tags (have I given up on XML too early?-).

Syntax hints

For the uninitiated, any indentation or colon in Wisp represents an additional nest level, while a dot escape the nesting. The at signs are used by SXML to denote attributes, which may remind you of XPath. For example, the anchor to the previous page is PREV.

html
  head
    link
      @ : rel "alternate"
          type "application/atom+xml"
          href "/atom.xml"
    ...
  body
    nav
      a : @ : href "41"
        . "PREV"
      h1 "PAGE 42"
      a : @ : href "43"
        . "NEXT"
    article
      @ : id "foobar"
      h2
        a : @ : href "#foobar"
          . "foobar"
      a : @ : href "/42/foo.jpg"
          img
            @ : src "/42/foo.small.jpg"
                alt "pic of foo"
                title "pic of foo"
      a : @ : href "/42/bar.jpg"
          ...
    article ...
    ...
    footer ...

So far, adding an article is not yet too cumbersome, there's only a bit of redundancy for permanent links and the nesting level is acceptable with the deepest being /html/body/article/a/img. It gets more repetitive once we publish it to to the linked Atom feed:

feed
  entry
    link
      @ : rel "alternate"
          type "application/xhtml+xml"
          href "https://gallery.example/42/#foobar"
    id "https://gallery.example/42/#foobar"
    title "foobar"
    content
      @ : type "xhtml"
      div
        img
          @ : src "https://gallery.example/42/foo.jpg"
              alt "pic of foo"
              title "pic of foo"
        img ...
    updated ...
  entry ...
  ...

Since web feeds are standalone documents, they must always use absolute URLs. (Welp that's not entirely true, XML Base does exists, but not all readers support it, and more importantly, certain elements such as atom:id disallow relative references.) In addition, whilst the web page links a thumbnail to the original image to save bandwidths, the feed can be consumed one post at a time, which thus points to the full size version. Therefore, copying the markup to embed it inside the Atom is error-prone and doesn't exactly spark joy.

Fun fact

What does spark joy is that we can embed XHTML directly into the web feed, which means the content is still XML and we don't need to quote it in CDATA. For other sites where contents don't accumulate up to hundreds of megabytes, this will allow us to slap some (SPOILER ALERT!) stylesheet on the Atom feed and let the user agent render it in a human-readable form.

Approach

I actually already spoiled it in the epigraph,^[1] but for the sake of completeness let us discuss a few possible solutions. What I wanted was to reduce the redundancy of manual input, in other words, a system transforming a custom information-dense format to standard yet sparser ones, which in this case are XHTML and Atom. Given some new photos and their relevant data, the purpose was to minimize the publishing friction.

It's worth mentioning that the goal was not to minimize the input format, the transformation speed, or feedback latency, but all of the above, plus the cost of constructing the tool, incrementally as our requirements slightly changes over time. Our choice for the base programming system shall affect each and every of these aspects and more.

Some technical dimensions are more equal than others, though. For this use case, IMHO immediate feedback loop should be given the number one priority, not only because it'd be frustrating to have to complete multiple rituals just to preview the changes, but also as watching and reflecting file system changes is (sadly still) a difficult problem.

For Linux^[2] there's inotify which doesn't suck, except when it does and misses events,^[3] and the standard POSIX build tool make relies on mtime which is also flaky. Some SSG work around this by spawning up a server with more sophisticated caching mechanism and even include a HTTP server sending out refresh events. Implementing such system is easily more expensive than doing the original task manually.

Luckily, there is another way. After the birth of imperative DOM manipulation programs running on VM inside browsers (Ecma scripts), there came a (now forgotten) art of purely functional DOM transformation. More specifically, XSLT can declaratively transform any XML document to another, and its best part is that modern browsers natively support it, i.e. there's no difference between editing the input document and the hypothetical output XHTML. For better portability and rendering performance, we can still generate the latter ahead-of-time (AoT) during deployment.

Implementation

Going back to the example, the input format could boil down to a more concise XML file, e.g. 42/index.xml:

page
  @ : prev "41"
      curr "42"
      next "43"
  post
    @ : title "foobar"
        time ...
    picture
      @ : filename "foo"
          desc "pic of foo"
    picture ...
    ...
  post ...
  ...

Page Generation

The stylesheet should then be declared at the beginning of the file, so that the user agent can automatically fetch and apply it to render the output XHML:

XSLT is essentially a templating language, similar to PHP (which is also older) and template libraries in your favorite languages. For the ease of reading, I will let the target document's namespace be the default, while aliasing the transformation one as xsl. The stylesheet for the web pages would look something like the following, which should be self-explanatory.

xsl:stylesheet
  xsl:template : @ : match "/page"
    xsl:variable : @ : name "base"
      xsl:text "/"
      xsl:value-of : @ : select "@curr"
      xsl:text "/"
    html
      head ...
      body
        nav
          xsl:if : @ : test "@prev != ''"
            a : @ : href "/{@prev}/"
              . "PREV"
          h1 : xsl:text "PAGE "
               xsl:value-of : @ : select "@curr"
          xsl:if : @ : test "@next != ''"
            ...
        xsl:for-each : @ : select "post"
          xsl:variable : @ : name "id"
            xsl:value-of
              @ : select "translate(@title, ' ', '-')"
          article
            @ : id "{$id}"
            h2
              a : @ : href "#{$id}"
                  xsl:value-of : @ : select "@title"
            xsl:for-each : @ : select "picture"
              a : @ : href "{$base}{@filename}.jpg"
                  img
                    @ : src "{$base}{@filename}.small.jpg"
                        alt "{@desc}"
                        title "{@desc}"
        footer ...

Feed Generation

Similarly, for Atom entries on a single page,

xsl:stylesheet
  xsl:variable : @ : name "root"
    . "https://gallery.example/"
  xsl:template : @ : match "/page"
    xsl:variable : @ : name "base"
      xsl:value-of : @ : select "$root"
      xsl:value-of : @ : select "@curr"
      xsl:text "/"
    xsl:for-each : @ : select "post"
      xsl:variable : @ : name "url"
        xsl:value-of : @ : select "$base"
        xsl:text "#"
        xsl:value-of
          @ : select "translate(@title, ' ', '-')"
      entry
        link
          @ : rel "alternate"
              type "application/xhtml+xml"
              href "{$url}"
        id : xsl:value-of : @ : select "$id"
        title : xsl:value-of : @ : select "@title"
        content
          @ : type "xhtml"
          div
            xsl:for-each : @ : select "picture"
              img
                @ : src "{$base}{@filename}.jpg"
                    alt "{@desc}"
                    title "{@desc}"
        updated : xsl:value-of : @ : select "@time"

The trickier part here is concatenating the entries together. Simple enough, instead of linking to the stylesheet in the data, we can read XML files directly from XSLT.

xsl:template
  @ : match "/"
  ...
  xsl:apply-templates
    @ : select "document('42/index.xml')/page"
  xsl:apply-templates ...
  ...

This allows us to do other cool things, such as embedding SVG in XHTML to make use of the parent element's currentcolor, while keeping the source files separate. It is especially useful for monochromatic icons, e.g.

xsl:copy-of : @ : select "document('cc.svg')/*"
xsl:copy-of : @ : select "document('by.svg')/*"
xsl:copy-of : @ : select "document('sa.svg')/*"

Thumbnail Generation

So far, we have met three out of the four requirements, only thing left is creating the thumbnails. Inspired by Ethan Dalool, I am going for fairly large ones of 1024 px in width,

large enough to comfortably browse the photos without clicking through to the big version of each, and the thumbnails are decently light and not too jpeggy at about 125-150 kilobytes on average.

At such size, I can aim for around ten photoes^[4] per page while maintaining a somewhat decent load time. Plus, since the width of images are hardcoded, page margin could be automatically inferred to never stretch them.

html {
    box-sizing: border-box;
    margin: auto;
    max-width: calc(1024px + 2ch);
}
body { margin: 0 1ch }

To generate the thumbnails, I use epeg together with make for wildcarding:

PICTURES := $(filter-out %.small.jpg $(PREFIX)/%.jpg, $(wildcard */*.jpg))
THUMBNAILS := $(patsubst %.jpg,%.small.jpg,$(PICTURES))%.small.jpg: %.jpg
	epeg -w 1024 -p -q 80 $< $@

The Makefile also define rules for AoT compilation using xsltproc for the web pages and feed. Apparently no feed reader supports XSLT, and for pages runtime processing negatively affect the performance due to the multiple round trips for the stylesheet and the vector icons.

DATA := $(wildcard */index.xml) index.xml
PAGES := $(patsubst %.xml,%.xhtml,$(DATA))
OUTPUTS := $(THUMBNAILS) $(PAGES) atom.xmlall: $(OUTPUTS)index.xml: $(LATEST)/index.xml
	ln -fs $< $@%.xhtml: %.xml page.xslt
	xsltproc page.xslt $< > $@atom.xml: atom.xslt $(DATA) $(wildcard *.svg)
	xsltproc atom.xslt > atom.xml

The full implementation is deployed to px.cnx.gdn, mirrored to the OpenNIC domain pix.sinyx.indy reusing the former's TLS certificate, because CA/Browser Forum disallows support for domains not recognized by ICANN and no CA for OpenNIC is mature enough.

Discussion

Okay you built your site using XML macros, so what? The syntax is clunky and you hate it so much yourself that not even a single line of code example here is in actual XML. Doesn't seem like a love story to me!

Like all relationships, it's not that simple. I've learned to not judge a book by its cover and come to the understanding that XML is the (ugly) equivalence of sexp.^[5] Unlike afterthoughts such as C preprocessors, Django-like templates, or even the Wisp-lookalike syntax of Slim, XML stylesheets is in the same data structure. To put it another way, one can use XSLT to generate XSLT from XSLT. Do I need it in this case or ever at all? Probably not, but that certainly makes XSL a lot more attractive in my eyes.

Furthermore, the tooling for XML is highly mature, from editors to linters and processors to rendering engines. It'd be lying to say you ain't fascinated that tis possible to directly feed browsers pure data instead of markup representations. More than that, one can have entirely static API endpoints that are both human- and machine-readable.

XSL is just declarative JS! You are so blinded by your lust for functional programming that you have become the very thing you swore to destroy!

My distaste for Ecma scripts is not due to DOM manipulation. Sure, I do find in-place modification inelegant for documents, but if only that's the only issue. I block them on most sites because they can interact with many things other than just the DOM, imposing privacy and security risks while fucking up the UX.

Architecturally, Ecma scripts enable the absolute bloody worst possible kind of web pages with zero data at all, fetching tiny pieces of content in JSON and turn performance to shit. The user agents then try to salvage efficiency by turning themselves into a distributed system component and adding optimizations that shall never be (ab)used for the sake of users. O ye cycle of doom!

Note that one can make a similar mistake with XSL regarding the number of round trips, and XML stylesheets can provide the same front-end/back-end separation. Both can be used to provide hot loading during development and AoT rendering in production (if not all, then many JS libraries support pre-rendering, ignoring the monstrous dependency graph). At the end of the day, it's not the matter of technology but principle: to be in the users' best interest.

There is nothing complex about the photo gallery, any existing SSG can do the same with minor tweaks! You never needed to write a new one to begin with!

I am wondering the same myself, but keep in mind there are details I've been hiding from in the example. I went all-in for the semantic web with the hope for best portability and accessibility. One thing I haven't mentioned is the lang attribute, e.g. en, vi or fr depending on the post. Adding this to the web pages requires the SSG to be somewhat modular, and even harder for the web feed.

Moreover, generic SSG are not designed to handle the difference in content between a page's article and the feed's corresponding entry, neither for having multiple posts in a single page. Pagination is also commonly implemented backwards, i.e. page 2 being the second latest one, making it impossible to avoid link rot.

Not to suggest that the majority of SSG are poorly designed, just that from a certain amount of context difference, tis cheaper to just redesign from scratch. This is not about XSL vs Go/Python/JS for SSG or web dev in general, but this specific and happen-to-be-far-from-complex case.

Conclusion

At the time of writing, XML has pretty much been superseded by JSON or YAML, for the better or worse. I have no love for YAML for obvious reasons, but it also saddens me to sometimes see JSON being solely used as a container for HTML. I hope that this essay can awaken something in you about XML and remind you about the semantic web in your next project. It worked out for me, maybe it'll work out for you too!

The story between XML and my photo gallery is a fond love story. They were born for each other, there was no drama, everything just werkt. Their romance inspire me to better appreciate stability and maturity, and value those right in front of my eyes yet I had been too blind to see. Anyway, this is getting too long, so Imma end it with another song.

Lookin' for perfect
Surrounded by artificial
You're the closest thing to real I've seen
Sure, everyone has their problems
That's a given
Yours are the easiest to tolerate

[1]	If you know, you know.

[2]	Yup, just the kernel.

[3]	But in case it works for you, check out entr.

[4]	Thumbnails, photoes, get it?-)

[5]	Or conventionally in most Lisp 1's, `sex?`.

Reply via email

Artisanal GIF Ripping

Thu, 16 Mar 2023 00:00:00 +0000

Artisanal GIF Ripping

Pronounciation

/dʒɪf/ is the format, /ɡɪf/ is the handball club in Eskilstuna, Sweden.

GIF, the graphics interchange format, is probably the most inefficient representation for animated images in quality/size that is widespread today. However, it does live up to its name, being also the most portable format for animated graphics exchange. If a device has a color display and is connected to the Internet, tis likely to support GIF out of the box.

Like with incandescent light bulbs, it'd be wasteful to have not switched to more efficient alternatives, though GIF still has its charms. Not having to worry about codec compatibility is one thing, the nostalgia induced by the .gif file extension is another. For years, I got an Internet folder full of those (along with still images and short videos) for offline viewing, shitposting and reaction.

More recently, I began to make my own animated images and take pride in them. This tutorial will step-by-step lay out my process in cutting out a high-quality GIF from a video, so you can be just like me!

Decide on the Format
Extract Frames
Crop and Resize
Combine

Decide on the Format

Just because you can doesn't mean that you should.

Good things often don't come out of desire, but necessity. GIF is cool because tis portable, but plain text is even more portable. Multimedia weigh hundreds of kilobytes each, so they better best convey whatever information they're meant to communicate. Don't replace the audio with a subtitle just so it can be an animated image. Don't loop anything longer than a few seconds.

There is an old saying: if tis doing fine being a video, let it be a video. Videos don't need each frame to be perfect, and thus all the following steps can be done in a single ffmpeg command. Work smart, even just for shitposting.

Extract Frames

Open the source video in mpv, seek then spam s. repeatedly. That's all, y waste time write lot word wen few word do trick?

Aight, maybe there's a bit more to it. Operating on the level of frames will allow you to skip redundant ones in case of misencoded sources or duplicate and reverse a subsequence for a closer-to-perfect loop, but really, there's not much to talk about.

Crop and Resize

Ripping a GIF from a video is taking a sequence of frames out of context. Framing and some objects in the scene might not make sense in the target animated image. Some also like to export videos with giant black bars just to fuck with us. That's where cropping comes into play. For measuring I use GIMP, which doesn't seem to be the right tool for the job, so please let me know of anything lighter that has a ruler.

In addition, videos from social media (and space-efficient movies) are heavily compressed and look pretty bad for their resolution. Usually I have to shrink them down to two third or a half of their original width for them to look decently sharp. When you have the geometry in mind, summon the image wizard:

crop="-crop ${w}x${h}+${dx}+${dy}"
resize="-resize ${width}x${height}"
parallel -i sh -c "convert $crop $resize"\
' {} $(basename {} .jpg).png' -- mpv-shot*.jpg

In case you don't have moreutils/GNU parallel, or if you are rocking a single-CPU machine, run:

mogrify $crop $resize -format png mpv-shot*.jpg

For resizing, image processing expert Nicolas Robidoux wrote a long article on resampling filters. Although I can't spot any distinction for this use case, it doesn't hurt to give it a read and play around with different options.

Combine

We are going to encode in best quality possible using gifski. The reason we converted from JPEG to PNG earlier is that our precious encoder refuses to touch any other image format, and that mpv take screenshots in JPEG by default (you can configure it to write in lossless format but in my experience, the source videos are often too heavily compressed for it to make any difference). Anyhow, invoking gifski is rather straightforward:

gifski -r $fps -Q 100 -o $name.gif mpv-shot*.png

Note that it optionally takes width and height as an argument and yield better quality than with pre-shrunken images, but at the cost of a significantly larger file size. Choose wisely.

Reply via email

The 2020 Experience

Sat, 07 Jan 2023 00:00:00 +0000

The 2020 Experience

Not to be confused with The 20/20 Experience

The Germination
The Fruition
The Disease
The Profit
The Migrations
The Moral

The Germination

To understand my 2020, we have to travel back a few months, when it all started. No, not that thing beginning at the end of '19. I am talking about my 2020 experience, remember?

The story started in October 1810 in the not-so-little city of Munich, Germany. Alright, it sounds like I lied about the 2019 and my story part, but bear with me, it's all connected. Anyhow, some Bavarian couple got married and threw a big party. People like parties, so naturally they celebrated the anniversaries, year after year until it became a tradition known to in English as the Oktoberfest.

Over two centuries years later, on the wedding day of another Bavarian couple,^[1] DigitalOcean began to an annual PR campaign on the same month called Hacktoberfest. I know, to many of you maintaining projects on GitHub (and more recently GitLab.com), the name might not remind you of something festive, but it really opened a new chapter in my life.

Back to the future in 2019, it was my first year taking part in the event. The premise was that one would receive a t-shirt after having filed at least four GitHub Pull Requests™.^[2] Unlike plethora, this does not sound like it was a lot, yet more than I ever had done. Getting out of my comfort zone was the first baby step, opening various opportunities in the upcoming months and perhaps, years.

My activities on GitHub over the years

The Fruition

Probably what I benefited the most from participating in Hacktoberfest was learning to not be afraid of communicating with complete strangers maintaining the software I use. Stepping into 2020, I started to do a larger variety of stuff in Python, which made installing libraries happen on a regular basis. The international Internet connection from home at the time was unstable and usually downloads from the package warehouse was a few kBps and that definitely did not help. A few moments later, I found myself on PyPA's IRC channel discussing strategies to speed up pip downloading.

After several days of on-off conversations (mostly I was asking questions to fill in the blanks), a proposal was under draft: I was an undergrad sophomore and had been eyeing on Google Summer of Code (GSoC) for quite a while. Applying for pip wasn't the plan, but rather Octave, the first big project I have contributed code to.^[3] Now thinking about it, it was a better choice since I was more comfortable with pip's tech stack. The rest of the story was already noted down so I won't be retelling it here.

The Disease

When the world had been battling SARS-CoV-2 for a few months, Việt Nam was barely affected. By refusing inbound travelers and temporary switching to work/study-from-home, the number of cases and deaths was neglible and by the end of summer we were virtually back to normal. I hated that most organizations, my university included, straight up offered big techs our data without a second thought, and was thankful online learning did not last.

Like many others, I spent that summer rarely leaving the house. I was grateful of GSoC for keeping me busy and giving me the opportunity to socialize with new cool people. It was impossible for me to catch the virus, I thought. I was not wrong though, but I got something else: dengue fever. The fever wasn't too bad, I was high as a kite for half a week, but never critical. The aftermath, however, was much less pleasant.

For the next week, I was in a living hell because of a throat infection. I'd had sore throats before, quite regularly in fact, often at least once every few months, but they had been a mere inconvenience. Usually, all I'd gotta do had been to person up, swallow a few times and get on with my day. This was different. Everything hurt like a bitch. The slightest texture or flavor could cause minutes of pain. For the first time, I experience throat lozenges being the opposite of soothing.

For the entire week, I survived on undercooked scrambled eggs and mushy porridge. I had to take α-chymotrypsin^[4] before every meal and was practically microdosing it throughout the day to be able to drink water. You can't imagine how happy I was when I could finally eat rice again. While the infection was not directly caused by dengue (it only weakened my immune system), the trauma was enough to make me finally care about home mosquito eradication. Guess who learnt it the hard way!

The Profit

GSoC gave me in stipend 3000 USD, minus Payoneer fees and shitty currency exchange "tax". That was the largest sum I'd ever had in my hands. Because of the low cost of living in Việt Nam,^[5] I no longer completely financially dependent on my parents. I could pay my own school fees (scholarship would give back the money months after paying), hang out more with friends (we had zero-COVID for a while, remember?), tip free software projects and services I had (and have) been using for years.

More importantly, I could buy myself future e-waste. I got a Model M so that I no longer need to change keyboard every year, a lefty Ploopy to ease my traffic-accident-injured right wrist that's prolly never gonna fully heal, a new DAP to replace my dead walk man, my first phone and perhaps some other things. No worries, I'm still daily driving them today, they ain't ended up in the landfill (yet).

The Migrations

Admittedly, the first freedesktop.org smartphone caught my eye was actually the Librem 5, which I could afford neither the time nor the money for. I know, the terminology sounds ridiculous, but Linux would include Android and GNU'd exclude postmarketOS. Anyway, Purism, the company behind the Librems, has seriously invested in adaptive GUI and federated services. My first ActivityPub account was provided by Librem One.

It was not the first time I use a federated service. I've used email for as long as I can remember and begun to use Matrix intensively since I entered university. So what (were there to be) changed? At the time, my online presence^[6] was primarily inside surveillance capitalist walled gardens. I was mostly active(ly posting) on bird site socializing with people I acquainted during my GSoC and publishing my development/shitpost^[7] videos to YouTube.

Nothing on fedi really caught my eyes, until I got (hyped up for getting) my PinePhone. Its software landscape was incredibly fast moving back then. Most peripherals were barely working. Desktop programs were being ported for narrower screens using brand new convergent libraries. Many developers were contracted by Purism or sponsored by Pine64, a large fraction of whom are free software purists, rejecting spyware disguised as social media. Never before, hanging out in chat rooms^[8] and the Fediverse were the absolutely best ways to keep up with life-quality-changing updates.

Like with desktop-handheld convergence, I was impressed with Fediverse's interoperability between multiple media formats, from (micro)blogs to picture albums to videos. Imagine being able to share and comment on a YouTube directly from Twitter! Shortly, I registered for a PeerTube account and migrate all my videos there. The longer I stayed on fedi, the more cool stuff I found and the more satisfied I was. Fast forward over two years, I have deleted or permanently logged out of most; only quiddit^[9] is left.

One thing led to another, Martijn Braam's apps introduced me to SourceHut, which embraces email for federation and focuses on useful stuff like SSH for CI, instead of trying to be a social media or relicense the projects it hosts. I have moved most of the software I maintain from GitHub to sr.ht, but the network effect is too strong: I still have to stick around with the former to contribute to software I regularly use.

However, it's unlikely that most of those growing up with GitHub, especially inexperienced contributors, will be willing to adapt to a workflow revolving around mailing lists for such kind of forge to become mainstream again. On the bright side, I start to seeing more larger projects hosting their development platform, and I am watching forge federation with great interest.

The Moral

At this point, you probably wonder, what I am trying to tell from all these random rambling. Welp, nothing. My life is not like the movies, there ain't no plot, no meaning. The whole point of this log is to bridge the gap between /blog and /blog/2020/gsoc. 2020 was indeed positively life-changing for me, tho/so I can't expect most of y'all'll be able to relate. 2023 is already underway, and I hope we will all have a year we can look back to the same way I did in this post. Perchance.

[1]	There must be at least one wedding everyday in Bavaria, I think.

[2]	It is a vendor locked-in version of git-request-pull.

[3]	Not counting Vim because it was a keymap contribution.

[4]	Proteolytic enzyme; taken orally for inflammation. Shit's magic.

[5]	A meal at a diner costed around 1 USD at the time.

[6]	Gah, I hate this term!

[7]	I don't like keeping too serious logs.

[8]	A room was bridged between 5 protocols, fun but also an eye sore.

[9]	Hey, the site name was a pun on read it in the first place!

Reply via email

Advent of Programming Languages

Mon, 26 Dec 2022 00:00:00 +0000

Advent of Programming Languages

Earlier this year I enrolled in a master's programme^[1] at UNIST and joined the Programming Languages and Software Engineering lab (PLaSE) as a student researcher. The stipend covers the school fees and living expenses, and I'm given an academic freedom to choose what to work on and take risks. I will review the life here in detail in another post, but (SPOILER ALERT!) overall I'm quite content with it.

That being said, PLaSE is new and small, we only do research on software engineering and don't do its name justice. Because of that, in the first year here I decided to do each day of Advent of Code^[2] in a language I'd never used in competitive programming (CP) before.

Here was my blacklist going in, chronologically: Pascal, Python, Scheme, C, C++, Common Lisp, Lua, Raku, Go, Rust and Zig. I am only proficient in over half of the listed languages, but dura lex, sed lex, I'd already had my CP first time with the rest.

To try any new language, all I have to do is dropping into an ephemeral shell with its implementation using nix-shell or guix shell without the fear of bloating up my systems. I'm running NixOS on my laptop with nixpkgs being one of the largest downstream repositories, including everything but the kitchen sink. On the work desktop, I installed Guix System which has a decent set of packages and nix service in case something is missing. Every update, I run garbage collection and get rid of all unnecessary software, i.e. those not declared in my config.

Day One

The first day should have been the warm up so I challenged myself with using POSIX utilities. This is a bit irony though as the majority of my time spent outside of the editor or a web browser is inside a (Bourn-again) shell.

The problem was indeed simple, involving only finding the maxima among the sums of newline-separated numbers. I used sed(1p) to turn the input into dc(1) eypressions, and sort(1p) and tail(1p) for picking the largest sum. Probably the most interesting part was that the summation was reusable to grade an assignment for a course I was a teaching assistant for.

Day Two

The second problem didn't ramp up much in difficulty. It only called for some rather simple arithmetic, and the input format's regularity convinced me to finally give Hare a try.

For just a taste, Hare is boring in a good way. I was excited for the tagged union of error which can include and propagate any debugging information, but unfortunately it wasn't needed for programs of such complexity (nor that errors are ever handled in CP). I'm looking forward to a chance to write more Hare in the future.

Day Three

The task for day 3 was literally day $1 + 2$ in scope. I went for another better C that is Nim. My first impression with it wasn't positive: Nim insists on considering each source file as a module and does not allow hyphens in identifier name, so filenames mustn't have any hyphen either. This had led me to piping the source code to nim c - and executing ~/.cache/nim/stdinfile_d/stdinfile to keep my solution naming convention. nim r - wouldn't have worked either since the convention also consists of reading the input from stdin.

On the bright side, uniform function call syntax, identifier case-(and underscore-)insensitivity and optional parentheses allowed me to dodge parentheses in calls and camelCasing altogether. Although I love Lisp and don't have any problem with brackets, I think their placement in ALGOL style hurts the readability of nested calls and camelCase is just objectively bad, pun^[3] unintended.

Day Four

The forth problem wasn't any harder, only requiring simple logic operations and summation. To save time, I opted for Julia, which I was kinda sorta familiar with in building this site (at the time this is published at least). Like Nim, it has higher-order functions and a (reference) compiler capable of producing fast binaries.

Day Five

The next day's task was finally a breath of fresh air with matrix parsing and LIFO (literal) stacks. It begged for a regular expression parser,^[4] hence I mined a tiny bit of Ruby for the task. Ruby had been designed to be an object-oriented Perl, and expectedly it feels very similar to Raku. To an extend, I was also able to avoid ALGOL-style call do quite a Lisp impression.

When I was looking for a second language to learn after the peak of my CP career in middle school, I was choosing between those with garbage-collection that are most popularly used in free software at the time, namely Perl, Python and Ruby. Perl was ruled out due to my fear of sigils and I picked up Python as I didn't want to be a weeaboo. Sometimes I wonder how my side projects would have turned out to be had I chosen differently.

Day Six

The sixth problem essentially asked for maintaining a finite queue of English letters until it is distinct. The most efficient way to do this is employing bit shifting for the FIFO and a bit set for the letters. I implemented that literally in the Better C subset of D.

Although the language is around my age and influenced the big names like modern C++, Swift and Zig,^[5] its documentation is pretty underwhelming and inconsistent. For instance, the 128-bit integer type cent is documented as a basic data type, however it only exists in the core.int128 library with more cumbersome usage (and doesn't work with dmd -betterC).

Like with Nim, D compilers also don't allow hyphens in source filenames, so I had to pipe the code to dmd -of=a.out - (the executable name would be randomized otherwise).

Day Seven

On the first Wednesday of the month of celebration, the problem was parsing cd and ls-like invocation and output to reconstruct a directory tree and do, uh, tree stuff. Janet's PEG module was much more delightful for parsing than regular expression on steroids like Raku's grammar.

Writing imperative S-expressions felt dirty, though it's IMHO a quite better take than Lua, understandably as it was originally a redesign of Fennel.

Day Eight

The eighth problem could be efficiently solved via dynamic programming on multidimensional arrays so I used Fortran for array programming. There's not much to say other than that it werkt and, ah yea, dynamic allocation didn't seem worth the effort so I ended up hardcoding the sizes.

Day Nine

The ninth task was about sparse matrix transformation. Naturally I used hash table in Tcl for this purpose and the solution was straightforward enough. I am planning on extending a video game's level configuration to be programmable and the top contenders are now Lua/Fennel, Janet and Tcl. No idea when I'll get to it, but I'mma keep ya posted.

Day Ten

On day 10, I needed to build a less-than-basic^[6] calculator. I thought using AWK would spice things up a bit, but it actually simplified the solution. Instead of having to read and parse each operation, the script is executed for each input line, even allowing interleaving matching. Therefore, the behavior specification could be followed closely without any significant effort on adapting the logic for the language.

I used to think of AWK as just a more verbose sed(1). I was wrong and am glad that I was. I guess AWK can come in pretty handy for similar real-world usages, such as log processing or moderately complex transformation of textual data.

May Day

Oops!… I did it again.^[2] If you thought because I published this right after Christmas it must be a complete advent journal, I have played you for absolute fools! The later problems were increasingly parsing heavy, and while I still had languages I wanted to try, none left was designed for text processing. I was also busy in meatspace at the time thus I couldn't find the time to write byte-level parsers in languages I didn't know.

I didn't try really hard nor got really far, but in the end maybe the real treasure was the experiences I had along the way. I suppose the contact hypothesis might be true, at least in this context; my prejudice against many languages had been cleared away even after surface-level interactions. You should probably also give it a try, who knows, it could be much gayer than you'd expect!

[1]	No, I have not been given any slave.

[2]	I know, last year I already quit citing how janky later problems were.

[3]	camelCase was popularized by mainstream object oriented languages.

[4]	Not really, reading byte-by-byte would also work, just less cool.

[5]	I feel underachieved now.

[6]	No eggs were harmed in the making of the solution.

Reply via email

De-Dependency December

Thu, 10 Nov 2022 00:00:00 +0000

De-Dependency December

As we mature, the dependency graph matures with us.

Exposition

In the occasional fights between system and language packagers, I'm known to take the downstream camp. As a user, there are lots of things I take for granted. I install the stuff I need, occasionally upgrade the system, and everything gets updated. Vulnerability in a library used by multiple programs? Its patched version gets swapped in within a few hours (given it's not vendored or pinned). Most distributions even apply hardening flags that some bugs aren't even exploitable in the first place. They create a safe place for me to comfortably express myself at work and at home.

Recently on my work computer, I've switched to Guix System, which has yet many packages. Looking into the way to package programs I use and ongoing efforts, I realized the colossal number of transitive dependencies of certain software and the impracticality for a user union (i.e. a distro) to maintain such set of micro packages in every language.

Confrontation

This gave me a more serious thought on software sustainability. Such topic often reminds us of energy consumption, modularity, development model, or even style (clean code). End-users (including self-hosts), on the other hand, ask the following questions to decide upon installing and keeping a piece of software:

Can I trust installing this won't do anything funny to my machine?
How much effort I need to prevent people from doing funny things to my machine if the software includes something that gets on the front page of some magazines tomorrow?
How much of my limited resources will it take to run or simply exist?

There are certain intersections in concerns of enterprises and users, however it's worth noticing that distributions are almost exclusively optimized to cater for the users' need. Not only they fight for the users, they are the users. Suppose you don't want to write yellow-glowing programs, you should make the life of downstream package maintainers easier. No, it does not count if you push them to give in to run go mod vendor or download from NPM recursively.

Resolution

So how do I write software that is easy to package, you may ask. If you followed the articles linked above, you've probably already figured that out. It's less about what you write and more about what you use. When someone complains a program is difficult to build from source, certainly it's not about how hard it is to type, say make install, but acquiring the dependencies for that to run successfully and the result will work.

Lower the number of dependencies will absolutely help. To put it bluntly, you can't have a problem with dependencies if there's none of them. This sounds like reinventing the wheel, but if the use case is common enough, you might find what you need in the standard library.^[1] I've been restricting myself from using third-party libraries for new side projects and it actually worked for my most recent ones:

Phylactery, a static comics web server on Go with CBZ parsing and concurrent request handling
Fead, an SSG plugin in Python for advertising others’ feed with parallel HTTP request, parsing of RSS 2 and Atom and CLI argument parsing

Even for such simple use cases, there are still many libraries in the wild that can handle more data formats, are more convenient to use or more performant. On the other hand, the amount of maintenance needed to keep the programs safe indefinitely for a user is much lower thanks to the small dependency footprint.

What I'm asking you to give a try in the advent days^[2] is not as drastic. Look through your works, find a library you require for a small portion of its power, or something can be implemented specifically for your project using reasonable effort (w.r.t. the whole codebase). This is not just for the sake of maintainability: being less general, the new implementation can likely outperform the replaced public library.

In many cases, you will find yourself making use of the standard library. Standards make life much easier, if only people can come up with an agreement. Or maybe they don't have to. Maybe each could choose among the utilities libraries. At the end of the day, it's the total number of packages that can have bugs to be reported upstream and patched that matters.

That being said, please keep an eye on the standard library the same way you (should) watch your other dependencies, just in case what you need is finally added. Worry not of backward incompatibility, users of LTS systems are content with older versions of your software.

Fall and Catastrophe

Just kidding, I'm offering answers, not tragedies. Winter is coming, join me in a De-Dependency December and fight for the users!

[1]	Unless you use Rust.

[2]	I'm not Christian, but I had fun with AoC and Neopets before.

Reply via email

Green Leaf Soup

Sun, 11 Sep 2022 00:00:00 +0000

Green Leaf Soup

At the time of writing, I am sharing a kitchen with around 40 people from all parts of the world. Very often, someone asks me to share a recipe from my cuisine, and I usually have to decline, blaming the lack of fresh ingredients,^[1] while I can only afford to shop for groceries weekly. While most Vietnamese dishes call for fresh meat (I can only buy refrigerated one), in certain case it doesn't really matter. One quick dish that could tolerate days-old meat is a soup of green leaf vegetables.

Back when I was still at home, a meal almost always consists of a soup. When we are about to finish a bowl of rice, we mix in the soup to wash all of the gelatinized starch (into our mouth). The soup could be anything from boiled vegetable broth to vine spinach and jute soup with crab juice. In that range of difficulty, I would rate the following recipe somewhere in the lower middle.

Ingredients

For a vegetable soup, of course you need a lot of veggie. As much as you can eat. I would recommend at least two handful per serving^[2] of any Brassica leafs, e.g. mustard greens, spoon cabbage or regular cabbage.^[3] The greener the plant the less starchy it is and the better it blends with the umami of the meat.

As for the animal product, minced or ground pork is a common choice. Minced chicken, fish or dried shrimp also works, but IMHO beef, lamb or goat could overpower the veggie. Meat is not the star of the show and should be used moderately, 50 grams^[4] would be generous. There is no vegan variation of this dish AFAICT, except for reducing to just water, leafs and seasoning, but even a child could cook that without a recipe.

In addition, a shallot is required for searing and fish sauce for seasoning. It is OK-ish to use onion in place of shallot;^[5] I am not a fan of using soy sauce in this dish though. Super salt (table salt and MSG 9:1 mix) is a better substitution in case you can't get your hands on the signature Vietnamese seasoning.

Last but not least, it would not be a soup without water. A cup should be enough to emerge the cooked veggie.

Preparation

First, wash and slice the vegetable and throw it in a colander to let the water rinse of. Next, chop the shallot thinly.

If you bought minced or ground meat, you are done preparing. Otherwise, it's mincing/grinding time, duh!

Cooking

Turn the stovetop to medium high and put on a stainless steel pan or pot. Doesn't have to stainless steel, anything smooth without a polymer coating would do. Pour in a touch of cooking oil (or a tiny spoon of lard) and start sautéing the shallot.

As soon as the pot is hot enough, immediately add the meat (don't wait for the shallot to turn golden brown, the slices are thin enough to be caramelized as the meat is seared). You don't need to stir since we don't need evenly cook it right now, but don't let it stick together. Use a spoon or a scraper to break it up and press it down for faster searing.

If you have fish sauce, pour it in after the meat finishes browning to develop even more flavor for a few seconds. Then, deglaze the pot using water and bring it to a boil. Throw the leafs into the pot and get the water boiling again. In case you use salt for seasoning, now is time to sprinkle it in the soup. Let it cook for another two or three minutes (radiant or thermal conductive coil could be switched off and maintain the heat for that duration) and it's ready to serve!

[1]	Good Vietnamese food I grow up eating are always from the freshest.

[2]	From now on, the amount of each ingredient is listed for one serving.

[3]	Outside of the genus, rau ngót is awesome if available.

[4]	Or about a dozen bullets in eagle and burger unit.

[5]	Not a whole onion, but around the size of your thumb per serving.

Reply via email

Comments for Static Sites without JavaScripts

Sun, 09 Jan 2022 00:00:00 +0000

Comments for Static Sites without JavaScripts

I'm open for criticism
But really, is it any room for criticism?

Recently, I've switched my feed reader from Newsboat to Liferea. The latter has a GUI and some extra features which make the experience a lot more comfy. For instance, custom enclosure handling lets me to finally migrate all of my YouTube subscriptions to Atom and conveniently browse and watch videos using mpv. Image support also allows me to directly view web comics.^[1] One of them, The Monster Under the Bed,^[2] does not embed the strips in its feed, but it has comments.

Yes, RSS includes support for , and I was not aware of it until very recently. I suppose many other people late to the (web feed) party are neither. Since the rise of static sites, feeds have regain popularity, even for Google to reconsider its direction. Compare to RSS or Atom, alternatives have the following shortcomings:

Usenet is generally obsolete to most people.
Mailing list messages are immutable.
Fora and social media are silos.^[3]
Social media are designed for ephemeral discussions.
Instant messaging is awful for archival.

On the other hand, news feeds are commonly read-only: only a few readers can render comments and even fewer are able to post one. On the server side, a dynamic server is needed to accept comments. Traditionally, it's the same as the system serving the website. Although this works, it is significantly more costly than a server dedicated to static sites, which scale a lot better.

Hackers have came up with multiple workarounds such as using microblogging or instant messaging to add comments to their static sites, but all require client-side code execution, which is an option for neither RSS nor Atom. Furthermore, JavaScript hurts portability and performance on the WWW, hence it should be avoided unless it is absolutely impossible to implement a feature otherwise. Commenting is not an exception.

Following is my adventure implementing a comment section for this very blog. If you're also up to the task, I think you should view what I did as an inspiration (rather than a reference) and don't be afraid to experiment around until satisfaction.

Choosing Back-End
Designing Data Flow
Implementation
1. Accepting Replies
2. Rendering Comments
3. Injecting Comments
Moderation

Choosing Back-End

As mentioned earlier, static sites or not, there still needs to be a dynamic component to accept incoming replies. HTTP requests would be the most portable since all netizen obviously have a web browser, but those are what we're trying to replace here. What else does everyone has nowadays? Something so common that it can be used to identify people upon service registrations? Exactly, emails and phone numbers!

OK, Imma stop horsing around. My back-end of choice would be emails. It's global, it's cheap and federated. Cellular services almost fit the bill, except that they would cost an arm and leg for one to comment around the web everyday via SMS, whose character limit is not facilitating thoughtful discussions either. As for forum, social medium or instant messaging, no platform has nearly as large of an user base as electronic mails.

It's not like any email would fit the comment section though. Especially not the HTML kind with a few hundred kilobytes of embedded CSS, JS and non-content images. From the security standpoint alone 'tis already a no-go. A light markup language like Markdown^[4] would be much better.

One great thing about using a mature technology like email is that we have all use cases covered. Filtering, exporting and parsing emails work out-of-box regardless of one's provider, MUA and programming preferences. I have an SourceHut account with which I can create mailing lists on-demand so I'm using it; however there's no reason exporting from your private inbox is any more difficult, presuming you have set up offline email.

Tips and tricks

Speaking of SourceHut, exporting a mailing list archive is rather easy, one could either use the button on the web UI or download from the API. As the operation is not exactly cost-free, the former is protected by a CSRF token and the latter by OAuth 2.0. If you are a fellow sr.ht user, you can use acurl on the build service with the URL from the GraphQL query { me { lists { results { name, archive } } } }.

Update

I stopped paying for sr.ht in May 2024 after years of Sourcehut failing to show any measurable progress towards reaching the beta status. I am now using public-inbox for public, eh, inboxes.

Designing Data Flow

I promise, this sounds bigger than it really is, but first, let's have a glance at how static generators work. Typically, there are three times templating happens:

Conversion of individual articles into HTML content
Inserting each article content in a page template to create a complete HTML document
Inserting multiple HTML contents into one RSS or Atom feed template

At completion, two kinds of output are generated: website and web feed. Similarly, comments have to be rendered for both targets: an HTML comment section for web browsing and a separate RSS feed for each article's .^[5] Therefore, injections should be done separately at stage 2 and 3. The overall process of static site generation with email comments is illustrated as follows.

For clarity, HTML and RSS input templates for comments and their parent page and web feed are omitted. Path to each comment feed output being injected in the respective web feed item is also not shown in the figure.

Implementation

At the time of writing, this personal website of mine was generated by Julia Franklin, who was neither fast^[6] nor semantic, but was the only one I knew supporting LaTeX prerendering out of the box. Franklin is also rather extendable via Julia functions.

Accepting Replies

Let's start with how each article can be programmatically and uniquely identified. By default in RSS, a GUID^[7] is the permanent URL of the associated web page. I am not exactly a creative person, so I mirrored this idea, although I only used the difference between URLs, i.e. minus the scheme, network location and trailing index.html (Franklin always appends it to the target path of any source file that is neither index.md nor index.html):

dir_url() = strip(dirname(locvar(:fd_url)), '/')
message_id() = "%3C$(dir_url())@cnx%3E"

For maximum portability, threading identification is used in emails' In-Reply-To header, which expects a message ID, which must match <.+@.+>. Once again, to avoid having to think, I opted for the path difference for the left hand side and my nickname cnx for the right. The mailto URI could be then be constructed accordingly:

using Printf: @sprintffunction hfun_mailto_comment()
  @sprintf("mailto:%s?%s=%s&%s=Re: %s",
           "cnx.site@loa.loang.net",
           "In-Reply-To", message_id(),
           "Subject", locvar(:title))
end

The anchor was then added to the page foot:

{{author}}

Rendering Comments

This is when the fun begins. Julia's standard library does not include an email parser, and I doubt your favorite language does either, unless it is named after a British comedy troupe. Python is often described as batteries included, or at least it used to (seemingly the consensus among current core devs has shifted towards favoring third-party libraries).

Off-topic rambling

Standard library inclusion wasn't really the deal breaker here though. I still needed a Markdown engine and a HTML sanitizer (because Markdown can include HTML), and AFAICT no stdlib has them. The read issue was with the lack of Julia packaging on most distributions (apart from Guix), and most certainly not on NixOS, my current distro. For the same reason the idea of rewriting Franklin in Python has been running in my head for a while now. Python packaging is much more downstream-friendly and unlike Julia compilation overhead is almost non-existent.

On the other hand, it's trivial to pipe an external program's output to Julia, e.g. readchomp(`echo foo bar`) would give you the string "foo bar". Thus, the to-be-written comment generator should take (the path to) a mail box, the message ID of the article and a template, and write the result to stdout. Argument parsing is, again, thankfully in Python's stdlib:

from argparse import ArgumentParser
from pathlib import Path
from urllib.parse import unquoteparser = ArgumentParser()
parser.add_argument('mbox')
parser.add_argument('id', type=unquote)
parser.add_argument('template', type=Path)
args = parser.parse_args()

I then parsed the mbox into a mapping indexed by parent message IDs as follows. They would be HTML-unquoted so that was why I needed to do the same for the input message ID.

from collections import defaultdict
from email.utils import parsedate_to_datetime
from mailbox import mboxdate = lambda m: parsedate_to_datetime(m['Date']).date()
archive = defaultdict(list)
for message in sorted(mbox(args.mbox), key=date):
    archive[message['In-Reply-To']].append(message)

As said earlier, arbitrary HTML content is not exactly suitable for comments. However, it is undeniable that HTML emails have taken over the world and compromises must be made: allowing multipart/alternative of both text/plain and text/html. It is not the only multipart, so are attachments and cryptographic signatures. Since we are only interested in the plaintext part, it is actually easier done than said to extract it:

from bleach import clean, linkify
from markdown import markdowndef get_body(message):
    if message.is_multipart():
        for payload in map(get_body, message.get_payload()):
            if payload is not None: return payload
    elif message.get_content_type() == 'text/plain':
        body = message.get_payload(decode=True)
        return clean(linkify(body, output_format='html5')),
                     tags=..., protocols=...)
    return None

Now all that's left is to render that body and relevant headers as an HTML segment or an RSS item. This is when we revisit the template. Jinja is probably the most popular in Python, thanks to Django and Flask, but its complexity is rather unnecessary. Instead, I went with the built-in str.format.

What are templates for, exactly? Not the complete document, apparently, because that would differs from article to article and increase the complexity for injection. Neither a single comment, as comments are threaded into trees (or a forest) and their relationship can be useful. We gotta meet in tha middle and use recursive templates instead, e.g. for nested comments:


  ...
  {children}

To render linear comments, such as for , simply move the children out of the item as follows.


  ...

{children}

The rest substitutions are mostly just extracted from the email's headers. Another bit that needs some extra decisions, though, is the parameters for the mailto URI to reply to each comment:

In-Reply-To set to current Message-Id
Cc set to current Reply-To (if exists) or From
Subject is inherited, with Re: prepended if missing

This is getting boring with a lot of trivial code, so I'll leave you with a pointer to the completed script named formbox and move on to more interesting stuff.

Injecting Comments

Inserting HTML comment sections is pretty simple. First I wrote a simple Julia function render_comments calling formbox under the hood, then

hfun_comments_rendered() = render_comments("comment.html")

comments_rendered is then injected below the article. For RSS, it took an extra steps:

Insert render_comments("comment.xml") to the comment feed template comments.xml (notice they are two different templates) and write it next to the article's output index.html
Insert the path of the written comment feed to the tag in the article's feed item

That's it!

Moderation

I don't want a Terms of Services page, it'd feel too corporate for my personal website, so I will list the rules here:

Please be excellent to each other. Disagreements are okay, personal insults are not.
Stay on topic. If you want to publicly discuss with me about something else, start a new thread on a mailing list or reach me via social media.
Use plaintext emails and do not top post. Markdown inline markups, block quotes, lists and code blocks are supported.
Comments are implied to be under CC BY-SA 4.0 unless declared otherwise.
I reserve the right to remove any comment I don't like. I generally don't delete comments, but if you want to exercise your freedom of speech, publish it yourself.
I do not warrant the availability of the comments either. I will try my best but one day all comments may just disappear, just like this website itself. Archive what you deem important.
These rules are subject to change according to my personal liking without notice.

Replies will only be rendered on the website and feed after I see them, so please expect a delay of at least 24 hours. If you are eager to reply to each other, subscribe to the site's mailing list instead.

[1]	TBF there are image preview scripts in Newsboat's contrib.

[2]	Content warning: occasionally NSFW

[3]	Federation is getting there for social media; not so much for fora.

[4]	But don't use text/markdown for your emails.

[5]	Unfortunately there's no equivalence for Atom.

[6]	Over 30 seconds to generate a few hundred kB of web pages.

[7]	Not to be confused with the micro soft hijacked term for UUID.

Reply via email

Generic Homemade Ham

Fri, 19 Nov 2021 00:00:00 +0000

Generic Homemade Ham

Where I'm from, hams are stupid expensive due to the lack of demand. This is unacceptable because I love hams!^[1] After years of not tasting even a single slice, I decided for myself to make some, and noting down what works and what doesn't.

Unlike other stuff you usually find on the interweb, the following recipe will not require any fancy equipment,^[2] chemical^[3] or quantities that (should) only appear in a math textbook. It will also try to be flexible, so that you can be free to experiment with whatever you feel like that day, while knowing for sure you'll still end up with something at least remotely resemble a piece of ham.

Brining

Making ham, like any other food, comprises of only two steps: preparing and cooking. Brining not only makes the meat salty^[4] but also enhances its tenderness by braking down the proteins.

The most important ingredients for this process are meat, salt and sugar. As for the meat, it's preferably from a pig's thigh, but anything with a similar texture will do. You do want a cut with parallel muscles to minimize the amount of silver skin and tendon though, plus it will have better presentation. As always, intramuscular fat is a delicious cherry on top, but not too crucial in this case. On the other hand, any kind of salt and sugar would do. Personally I use sea salt and brown sugar because they are the cheapest to be found locally, whilst they add some extra flavors and minerals.

Dry

Dry brining is only suitable for (family-)serving-size cuts of meat, somewhere from 200 to 500 grams. Anything larger would have troubles absorbing the seasoning. Otherwise, cover the meat in coarse salt and sugar and leave it in the fridge from a few hours to overnight, depending on its mass.

How much seasoning? Be generous, but you'd want to still be able to see the meat underneath. I don't think you can't overseason it, just remember to rinse off the remaining rub before cooking. As for the ratio, I like to twice as much salt as sugar, but I've seen people doing 1:1 or even 1:2.

Wet

The brine formula I'm about to describe is heavily influenced from Mike G's recipe, which is also uncured ham. First, pour enough water to submerge the meat in a pot (no, don't put the meat in the pot) and heat it up. If you have a fairly fitting container, the amount is close to the mass of the meat itself.

Then, add 5% salt, 3% sugar, and whatever spices can go well with your future ham. I usually use a few bay leaves, some thyme and crushed peppercorn, but any aromatic, fresh or dry, should work. You don't have to be exact with the amount of seasoning either: if you don't have a scale, measure with a spoon and be generous. Due to the lack of nitrate, the brining shouldn't occur for more than a few days and the more concentrated the solution, the faster the absorption.

Let the brine cool down, pour it in a container, drown the meat^[5] (use a weight if necessary) and put it in the fridge. A cut of a few hundred grams should take around 24 hours.

Cooking

After taking the meat out of the fridge and wash it lightly, wait around an hour for it to reach room temperature. If you don't have paper towel, place it on a rack or an elevated plane to dry off the surface.

Before cooking, I like to rub a few other extra spices on my meat. My favorite are smoked paprika (for the smoky flavor), garlic powder, freshly grounded black pepper and perhaps some nutmeg.

From here, it's similar to cooking a steak: you'd want it in an environment close to the target temperature, which is around 68°C, or 63°C if pork in your area is heavily regulated. The closer it is, the smaller the difference between the center and the outer layers may be, i.e. you'll less likely to overcook the latter. There are three ways^[6] to do this indoor: sous vide, pan-frying and oven-roasting. If you have a sous vide machine, I'd assume you wouldn't need my instructions, so I will focus on the other two methods.

Pan-frying

First, rub a touch of cooking oil^[7] all over your meat, then turn on the stove to the lowest-possible heat and place the pan and the meat on it. It should take 30 to 40 minutes to reach to desired temperature, depending on your stove. You can use your finger or a chopstick to poke on the meat: if it feels raw it's probably raw, if it's solid it's overcooked; you'd want it bouncy, right before it stops being so. Yes, it's a lot of trial and error and unnecessarily stressful, just get a thermometer, especially the one you can stick in for the entire process.

It is not compulsory to sear a ham, but I'm addicted to the Maillard reaction so Imma do it anyway. You can sear before or after cooking, I usually do the latter (reverse searing) because it seems to make more sense. Move the meat to a temporary plate and wipe the pan clean. Turn the stove up to medium-high and wait for it to get hot.

If your meat does not look like it can fit it a body building contest, coat it with little more oil, then drop it on the pan. Rotate it every 30 seconds until the whole surface area is golden brown, then transport it back to the plate for resting until you can comfortably touch it before slicing. Serve with yellow mustard.

Roasting

If you have an oven, place the meat on its rack and turn it down to lowest heat (mine is 100°C). In this method, a thermometer is also compulsory to monitor the meat inner temperature, which should take around 80 minutes to raise to the target one. I suggest bisecting the checking intervals, e.g. check after 40 minutes, then 20, and so on.

If you're worried about the wasted energy, you can cut some carrots, potatoes, tomatoes and/or onions (anything high in carbs, really) in half and throw them in the oven. After taking the meat out, turn the oven up to highest and you'll have some beautifully caramelized side dishes.

The oven I have at home is not powerful enough for searing the meat (quickly) so I usually turn to the pan instead.

Slow cooking (bonus)

This is a bonus because I could never make a ham out of it, but pulled pork. On the other hand, it's so tender that you won't be able to slice and needs much less attention. Since we won't sear the meat, it's a good idea to use a binding like mustard to stick even more rubbing spices on the surface.

After rubbing, touch the bottom of the slow cooker with a bit a oil to avoid sticking, drop the bay leafs from the brine on it and place the meat on top. Cook low from six to eight hours, then using forks or chopsticks separate the muscles from each other. You can serve immediately or let it cook a bit more after pulling.

[1]	Especially Jon Hamm's John Ham.

[2]	Ain't nobody got at smoker at home.

[3]	Where can I get nitrates? A chemistry lab?

[4]	Like yours truly.

[5]	Or the other way around, it's not cereal.

[6]

Nice!

[7]	One with smoking point above 170°C.

Reply via email

NixOS on Btrfs+tmpfs

Sun, 14 Nov 2021 00:00:00 +0000

NixOS on Btrfs+tmpfs

In 2018, dad bought me a new laptop to replace the good ole Compaq nx7010 whose screen unfortunately got infected by some sort of microbe and dieded shortly afterwards. The new one, whilst having a considerably worse build quality (like all other late-2010s ones when compared to mid-2000s models), had a dozen times as much storage: a 250 GB M.2 SSD and a 500 GB SATA HDD.

My data hoarding habit has grown exponentially ever since. Initially, I used to back up the data from the SSD to the HDD but after a few years, I ran out of space and decided to get some more storage. Instead of buying a portable hard disk like a normal person would, I went for an SATA SSD, as it was rather difficult to find a 7200 rpm 2.5-inch^[1] HDD in the market at the time.

I then asked my father for a spare SATA-to-USB case (he switched to using a dock a while ago, and like other dads, nothing is ever thrown away) and prepared to swap the drives. As cloning the data would have been too easy, I decided to spice things up by reinstalling the OS. Back then I was dual-booting Debian and NixOS, but the former had hardly been ever booted for months so it was time to let it go:

In addition, I wanted to hop on the new and shinny^[2] train of Btrfs. It has compression, snapshots and subvolumes, what's not to love? Let's replace something I'd been using for nearly a decade with a file system I had absolutely zero experience with, what could possibly go wrong, right?

Reinstallation
1. Preparation
2. Partitioning
3. Configuration
4. Installation
5. Profits
Backup
1. Initialization
2. Repetition

Reinstallation

I was going to reinstall NixOS with an ephemeral root, which had been covered to death in the following brilliant resources:

The only twist here is that I was using Btrfs instead of ZFS or ext4 like in other guides. This choice would influence how to back up in the later section.

Preparation

First of all, I temporarily copied data to the SATA SSD from the M.2, including my Nix configurations. Using either cp or rsync didn't seem to have any effect on the performance, and in the mean time I also went ahead and grabbed a NixOS unstable live image and dd'ed it to a flash drive. As I'm tracking unstable, installing from the same version would allowed me to skip switching the channel and a lot of downloading.

Partitioning

After booting up the live image, I opened up a root shell with sudo -i. As expected, fdisk reports the M.2 SSD as /dev/nvme0n1. Paranoid as always, I decided to give the EFI system partition a whole gibibyte, swap eight to match memory^[3] and the rest as a single chonky Btrfs partition:

parted /dev/nvme0n1 -- mklabel gpt
parted /dev/nvme0n1 -- mkpart ESP fat32 1MiB 1GiB
parted /dev/nvme0n1 -- set 1 boot on
mkfs.vfat /dev/nvme0n1p1parted /dev/nvme0n1 -- mkpart Swap linux-swap 1GiB 9GiB
mkswap -L Swap /dev/nvme0n1p2
swapon /dev/nvme0n1p2parted /dev/nvme0n1 -- mkpart primary 9GiB 100%
mkfs.btrfs -L Butter /dev/nvme0n1p3

As I typed this, I realized that I should have set up encryption for the last partition so I would probably need to reinstall in the near future to fix this mistake. Anyway, with the target system's root mounted as tmpfs, I would need to persist /nix (obviously), /etc (mostly for authentication and other secret stuff not included in configuration.nix that I was too lazy to opt in individually), /var/log, /root and /home:

mount /dev/nvme0n1p3 /mnt
btrfs subvolume create /mnt/nix
btrfs subvolume create /mnt/etc
btrfs subvolume create /mnt/log
btrfs subvolume create /mnt/root
btrfs subvolume create /mnt/home
umount /mnt

Most subvolumes can be mounted with noatime, except for /home where I frequently need to sort files by modification time. All of them should have forced compression though:

mount -t tmpfs -o mode=755 none /mnt
mkdir -p /mnt/{boot,nix,etc,var/log,root,home}
mount /dev/nvme0n1p1 /mnt/boot
mount -o subvol=nix,compress-force=zstd,noatime /dev/nvme0n1p3 /mnt/nix
mount -o subvol=etc,compress-force=zstd,noatime /dev/nvme0n1p3 /mnt/etc
mount -o subvol=log,compress-force=zstd,noatime /dev/nvme0n1p3 /mnt/var/log
mount -o subvol=root,compress-force=zstd,noatime /dev/nvme0n1p3 /mnt/root
mount -o subvol=home,compress-force=zstd /dev/nvme0n1p3 /mnt/home

Configuration

With everything mounted, nixos-generate-config --root /mnt could be run to generate a basic configuration. But wait, didn't I say something about my dot files? That's correct, but it's not easy to handcraft the hardware-configuration.nix. After making sure all are mounted with the right options and services.fstrim.enable is true, I copied other configuration files to /etc/nixos and finished this step.

Installation

NixOS installation is as simple as running nixos-install. But my job was not done after setting the root password and rebooting into the new system. It was working, but not functional. There was nothing meaningful for me to do on it, so I had to log in (as root), passwd'ed the user and copied the home folder back from the temporary drive.

After freeing the new SATA SSD, I also filled it with butter. Yes, all the way, no GPT, no MBR, just Btrfs, whose subvolumes were used in place of partitions:

mkfs.btrfs -f -L Fly /dev/sdb
mkdir -p /mnt
mount /dev/sdb /mnt
btrfs subvolume create /mnt/movies

At that time the only disposable data I had were my movies collection. The HDD also contained other data but they were rebalanced at /home (on the M.2). After swapping the SATA SSD inside the laptop, I logged in as the normal user and get the exact same environment before the reinstallation.

Profits

Thanks to subvolumes and compression, the free spaces were no longer fragmented and I think I gained like 100 GB (not counting the old Debian's root). Backup would also be less painful with Btrfs snapshots (instead of plain rsync like I used to) as shown as follows.

Backup

With all data migrated, the HDD could be used for backing up. First, some legacy data I no longer access were moved there, then I started to back up my /home partition:

Initialization

Having learned my lesson, I did not forget to set up LUKS this time:

cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb backup

To make use of snapshots, the backup drive gotta be Btrfs as well. The compression level was turned up to 14 this time (default was 3):

mkfs.btrfs -L Backup /dev/mapper/backup
mkdir /backup
mount -o noatime,compress-force=zstd:14 /dev/mapper/backup /backup

Following Btrfs Wiki, I made the first /home snapshot and sent it to the backup drive:

btrfs subvolume create /backup/home
today=$(date --iso-8601)
btrfs subvolume snapshot -r /home /home/$today
sync
btrfs send /home/$today | btrfs receive /backup/home
sync

Repetition

For next backups, I also mounted the drive and created a snapshot:

cryptsetup luksOpen /dev/sdb backup
mkdir -p /backup
mount -o noatime,compress-force=zstd:14 /dev/mapper/backup /backup
today=$(date --iso-8601)
btrfs subvolume snapshot -r /home /home/$today
sync

Say the latest snapshot was on the $previous day, I only needed to send the difference between the old and new backup. Afterwards, it is safe to delete the local $previous snapshot to save some space.

btrfs send -p /home/$previous /home/$today | btrfs receive /backup/home
btrfs subvolume delete /home/$previous
sync

Finally, unmount the drive and close the LUKS volume:

umount /backup
cryptsetup luksClose backup

Is this more complicated than good ole rsync? Yes. Is it safer? Also yes, thanks to copy-on-write. Would I bother using one of the tools suggested in the wiki? Probably not, I've already documented everything in this article in case I forget anything.

[1]	63.5 mm for those outside of the land of guns and burgers

[2]	OK, maybe not new, but certainly shinny

[3]	Slightly larger since some of the memory is dedicated to graphics

Reply via email

Writing a Clipboard Manager

Sat, 03 Jul 2021 00:00:00 +0000

Writing a Clipboard Manager

A word of protest

This was intended to be presented in The Raku Conference, however the organizers insisted on using Zoom and Skype, which are privacy invasive platforms running on proprietary software and shadily managed.

Motivation
Inspirations and Design
Daemon Implementation
1. Reading Inputs
2. Cache Directory Setup
3. Comparing and Saving Selections
4. Command-Line Interface
Client Implementation
1. Back-End
2. Front-End
Conclusion

Motivation

Clipboard management is very important to my workflow. To me, a clipboard manager is useful in two ways:

It extends my (rather poor) temporary mundane memory by caching a few dozens of most recent selections.
It synchronizes clipboard and primary selections. Since some programs only support one kind of selection, this is particularly useful.

For the first point, I have to be able to choose from the history by pressing a few keystrokes. Having to touch the mouse during writing sessions is unacceptable. The menu dropping down from the systray is also undesirable because I have a multi-monitor setup. This narrows down to only one plausible option: Diodon, which I having been using on Debian for at least two years. However, as I was migrating to NixOS earlier last month, I was unable to package it for Nix.

Naturally, I went looking for alternatives, most of which I had tried before and did not satisfy my requirements. clipmenu got my attention however: it was made to work with dmenu(-compliant launchers), which I had a rather nice experience with in Sxmo on my PinePhone. However, I use awesome on my workstation and its widget toolkit covers my launcher and menu need perfectly. I don't need dmenu and do not wish to spend time configuring and theming it. Plus, the architecture of dmenu scripts and awesome widgets vastly differs: while awesome executes the input programs, dmenu is called from the scripts.

Inspirations and Design

As even the most plausible candidate is not a suitable replacement, I would need to write my own clipboard manager. clipmenu is not really a good base though because it's written in shell script, something I ain't fluent in.^[1] Its idea is brilliant however:

clipmenud uses clipnotify to wait for new clipboard events.

If clipmenud detects changes to the clipboard contents, it writes them out to the cache directory and an index using a hash as the filename.

I later translated clipnotify to Zig^[2] and called it clipbuzz.^[3] From clipbuzz's usage,

while clipbuzz
do # something with xclip or xsel
done

and this is exactly how yet another clipboard manager was written, but before we get there, let's talk about this article's sponsor!

I'm kidding d-; though we cannot jump into the implementation just yet: we only resolved the first point out of two. How about the data structure? Hashing sounds like overengineering in this case: nobody needs more than a few dozen entries^[4] and hashes are not very memorable. Printable characters can serve much better as indices.

What? What happens when we run out of them? We reuse/recycle them!^[5] They would also fit within one single line, heck, we just store all of them in order inside a file and rotate each time there's a new selection. Picking would just be moving a char to the beginning. The entire cache directory can just look something like this:

$ ls $XDG_CACHE_HOME/$project
order
R
A
K
U

Wait, is that a sign? We must use Raku to implement $project then… Speaking of $project, I planned to use it with awesome and vicious so let's call it something brutal, like a cutting board, which is thớt in Vietnamese, an Internet slang for thread. Cool, now we have the daemon name, and conventionally the client shall be threac, or threa client.

Daemon Implementation

Reading Inputs

Raku was chosen^[6] for the ease of text manipulation and seamless interfacing with external programs. I learned it quite a while ago and has always been waiting for a chance to do something more practical with it, other than competitive programming which isn't a good fit due to Rakudo's poor performance. In Raku, the snippet from clipbuzz's README becomes:

while run 'clipbuzz' {
    # do something with xclip or xsel
}

Out of all languages I know, this is by far the simplest way to run an external program. Most would require one to import something or do something with the call's return value, and don't even get me start on POSIX fork and exec model.

OK, now what are we gonna do with xclip? One obvious thing would be to read the current selection. Raku got you covered, fam:

my $selection = qx/xclip -out/;

Remember when I said Raku can seamlessly interact with external programs? qx is how you capture their standard output, it is really that simple. But wait, which selection is that? No worries, xclip supports both primary and clipboard:

my $primary = qx/xclip -out -selection primary/;
my $clipbroad = qx/xclip -out -selection clipboard/;

Cache Directory Setup

This is when we write those selection down for later use, right? Well, we need to figure out where to save them first. According to XDG Base Directory Specification, $XDG_CACHE_HOME shall falls back to $HOME/.cache:

my $XDG_CACHE_HOME = %*ENV // path $*HOME / '.cache':;

For convenience purposes, I defined the / operator as an alias for path concatination:

multi sub infix:($parent, $child) { add $parent: $child }

With $XDG_CACHE_HOME defined, we can prepare the base directory as follows:

my $base = $XDG_CACHE_HOME.IO / 'threa';
mkdir $base: unless $base.e;
die "thread: $base: File exists" when $base.f;

As a wise man once said,

As it often happens, writing an article ends up with a bug found in Rakudo.

In this case, there's a bug in mkdir that makes it happily returns even if the target path is a file. I'm trying to fix it at the moment but a test is still failing. Update: it passed after a maintainer bumped the dependencies to the patched version.

Anyway, back to our clipboard manager. Here we are using flow controllers such as unless and when in the form of statement modifiers, which can sometimes be easier on eyes keeping the code flat. Existence checks like e (exists) and f (file) are also really handy. Next, we check on the order:

constant $ALNUM = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';sub valid($path) {
    return False unless $path.f;
    so /^\w\w+$/ && .chars == .comb.unique given trim slurp $path
}my $order = $base / 'order';
spurt $order, $ALNUM unless valid $order;

Instead of printable, we only allow alphanumerics and fallback to the uppercase ones (mainly because my screen can only fit as much vertically), unless $XDG_CACHE_HOME/threa/order is a file, contains at least two unique alphanumerics (exclusively). Reading and writing files in Raku is incredibly trivial, just slurp and spurt the path. Since we are not interested in whitespaces, they are trim'ed from order. Notice that Raku allows subroutines to be called without any parentheses—I love Lisp, but opening parenthesis after the function name always confuses me, especially when nested.

As you might have guessed, given is another statement modifier setting the topic variable that is particularly useful in pointfree programming, where regular expressions (e.g. /^\w\w+$/) are matched against directly and methods are called without specifying the object. Raku is also a weakly-typed language: .comb.unique (a list of unique characters) is coerced into an integer when compared to one (number of .chars).

Comparing and Saving Selections

What do we do with the order then? First we can determine the latest selection and compare it to the ones we got from xclip earlier to see which one is really new. We'll also need to rotate the order, i.e. write the new selection to the $last file and move it in front of the others that we $keep as-is:

my ($first, $keep, $last) = do
    given trim slurp $order { .comb.first, .chop, .substr: *-1 }
my ($other, $content) = do given try slurp $base / $first or '' {
    when * ne $primary { 'clipboard', $primary }
    when * ne $clipboard { 'primary', $clipboard }
}

On the first few run, probably the cache files don't exist just yet, so we fall them back to empty ones using try ... or .... We need to know the $other selection (outdated one) to later synchronize them both. In case of reselection, neither is updated and we simply skip this iteration:

next unless $other;

Otherwise, let's go ahead, write down the $content, rotate $order and synchronize with the $other selection:

my $path = $base / $last;
spurt $path, $content;
spurt $order, $last ~ $keep;
run , $other, $path

That's it, now put the daemon in $PATH and run it in ~/.xinitrc or something IDK. If you're worried that some selection might be too big to read that you'll the next event, asynchronize the qx calls by prefixing them with start, and await the results later on. It is that easy.

Command-Line Interface

Hol up, what if I want to store the cache elsewhere or use another set of characters? "Then you can go right ahead and have an intercourse with yourself, you ungrateful little piece of [redacted]." I would have said this were I to implement this in other languages, but luckily I got Raku, and Raku got sub MAIN:

sub MAIN(
  :$children where /^\w\w+$/ = $ALNUM, #= alphanumerics
  :$parent = $XDG_CACHE_HOME           #= cache path
) {
    my $snowflakes = $children.comb.unique.join;
    my $base = $parent.IO / 'threa';
    my $order = $base / 'order';    while run 'clipbuzz' {
        ...
        spurt $order, $snowflakes unless valid $order;
        ...
    }
}

No matter how cool you think this is, it is cooler, I mean, look:

$ thread --help
Usage:
  thread [--children[=Str where { ... }]] [--parent=]
  
    --children[=Str where { ... }]    alphanumerics
    --parent=                    cache path

Client Implementation

Back-End

Following the Unix™ philosophy, threac will do only one thing and do it well: it shall take the chosen selection and schedule it to move to the beginning:

my $XDG_CACHE_HOME = %*ENV // path add $*HOME: '.cache':;sub MAIN(
   $choice where /^\w?$/,    #= alphanumeric
  :$parent = $XDG_CACHE_HOME #= cache path
) {
    my $base = $parent.IO.add: 'threa';
    my $order = add $base: 'order';
    spurt $order, S/$choice(.*)/$0$choice/ with $order.slurp;
    my $path = $base.add: $choice;
    run 'xclip', $path
}

The highlight here is the non-destructive substitution S///, which allow regex substitution in a pointfree and pure manner. Though, instead of moving $choice to top of the deque, we place it at the bottom and use xclip to trigger the daemon to do it and synchronize between selections.

Front-End

Note that threac does not give any output: selection history (by default) are stored in a standard and convenient location to be read by any front-end of choice. For awesome I made a menu whose each entry is wired to threac and xdotool (to simulate primary paste with S-Insert) and bind the whole thing to a keyboard shortcut.

local base = os.getenv("HOME") .. "/.cache/threa/"
local command = "threac %s && xdotool key shift+Insert"
local f = io.open(base .. "order")
local order = f:read("*a")
f:close()local items = {}
for c in order:gmatch(".") do
  local f = io.open(base .. c)
  table.insert(items, {f:read("*a"):gsub("\n", " "), function ()
    awful.spawn.with_shell(command:format(c))
  end})
  f:close()
end
awful.menu{items = items, theme = {width = 911}}:show()

Conclusion

Through writing the clipboard manager threa, which is released under GNU GPLv3+ on SourceHut, we have discovered a few features of Raku that make it a great scripting language:

Out-of-box CLI support:
- Running programs and capturing output
- Environment variables
- File system operations
- Builtin argument parser
Concision:
- Statement modifiers
- Topic variable
- First-class regex
- Trivial asynchronization

As a generic programming language, Raku has other classes of characteristics that makes it useful in other larger projects such as grammars (i.e. regex on steroids) and OOP for human beings. It is a truly versatile language and I really hope my words can convince someone new to try it out!

[1]	I ain't proud of this, okay?

[2]	I'm obsessed with exotic languages.

[3]	The z's are for Zig, how original, I know.

[4]	[citation needed]

[5]	Wow much environment!

[6]	By some supernatural being of course!

Reply via email

To Poo or Not to Poo

Sun, 23 May 2021 00:00:00 +0000

To Poo or Not to Poo

Late April 2021, Việt Nam witnessed the beginning of the fourth wave of SARS-CoV-2 after a few months without any community case. Soon enough, students are told to not come to their schools' campus. This happens when I was an intern at USTH ICTLab, so I was advised to work remotely as well. I asked for this at the start of the internship but my supervisor was rather reluctant, since there was multiple interns working together and communication in person might be most effective. Working from home was beneficial to me in a few important ways:

I had a three-monitor setup at home and a more comfortable space.
I could have be more flexible working hours at home.
I did not have to bike back and forth to the lab (which is 4 km away) twice a day^[1], which could be exhausting in the hot summer.

Thanks to the last point, I also sweat a lot less and as I no longer had to maintain a public appearance, I decided to give #nopoo a try. I had been aware of such practice for quite a few years, but had never thought of actually implementing it until I saw Johnny Harris' vlog, which I can only describe as intriguing. TL;DW the journalist maintained that generally shampoos washed away his scalp's natural oil, and in combination with other hair products made the scalp itchy and unhealthy. His solution was to drop the use of all products completely and so far it had been working for him.^[2]

Well, my head was itchy sometimes (still itchy at the time of writing), alors, pour la patrie, les sciences et la gloire, let's do it!

Day One

I was going full no poo, no soap, no baking soda, no vinegar, just water, raw water. Everything was going as expected, my hair was not as fluffy as usual after washing, but it was easier to get in shape. I didn't really style my hair. Not as a fashion statement, I was (still am!) just rather lazy. Usually this wasn't an issue, unless when my hair was long, it tent to cover my forehead, ears and eyes, which was arguably an uncomfortable experience. Having the hair stay in place was indeed a blessing!

Day Two

My hair started to feel thicker and running hands through it no longer felt simulating. On the bright side it looked fabulous and did not itch.

Day Four

My hair and scalp began to feel greasy. I guess it was because I did not wash it thoroughly that day. With just water one would need to take more effort scrubbing the hair and especially the scalps to return them to a comfortable state. Plus my mentality got worse so my perceived experience could be exaggeratedly negative.

Day Five

I worked out and paid more attention to the hair washing process. It felt noticeably better.

Day Six

The brief revival of my mental health did not last very long:^[3] later that day I was completely autopiloting and accidentally poo'ed myself. It felt fluffy again but I was disappointed that things did not go as planned.

Day Seven

I decided to cut my hair. I had been doing so for a decade when I wrote this, but I got neither better nor faster at it, so it only happens twice or thrice a year. Of course I had to poo myself after to get rid of all the tiny pieces.

Day Eleven

Fast forward a few days it started to feel greasy again, but this time the hair was shorter so it was less of an issue. I began to apply saline to the hair after washing and somehow it helped a lot in improving the situation. Saline was also my solution for face acne in my teenage year (along with finger nails and pillowcase hygiene).

Day Fifteen

At this point the experience had become more stable. My scalp still itched occasionally but seemly less often than when I was poo'ing more regularly. The hair stayed in shape with merely any effort (I didn't even use a comb).

Overall, the difference is barely noticeable otherwise but I think I will be continuing holding my poo for another while, probably in long term. Do not let my experience speak for you, however, try it yourself if you are interested, but keep observing the effect objectively.

[1]	I usually had lunch at home with my parents.

[2]	Emphases his.^[4]

[3]	I later discovered that this was due to the lack of sunlight.

[4]	He stressed that this might not be the case for everyone.^[5]

[5]	OK, I get it, footnotes are distracting.

Reply via email

Google Summer of Code 2020

Mon, 31 Aug 2020 00:00:00 +0000

Google Summer of Code 2020

In the summer of 2020, I worked with the contributors of pip, trying to improve the networking performance of the package manager. Admittedly, at the end of the internship period, the benchmark said otherwise; though I really hope the clean-up and minor fixes I happened to be doing to the codebase over the summer, in addition to the implementation of parallel utils and lazy wheel, might actually help the project.

Personally, I learned a lot: not just about Python packaging and networking stuff, but also on how to work with others. I am really grateful to @pradyunsg (my mentor), @chrahunt, @uranusjr, @pfmoore, @brainwane, @sbidoul, @xavfernandez, @webknjaz, @jaraco, @deveshks, @gutsytechster, @dholth, @dstufft, @cosmicexplorer and @ofek. While this feels like a long shout-out list, it really isn't. These people are the maintainers, the contributors of pip and/or other Python packaging projects, and more importantly, they have been more than helpful, encouraging and patient to me throughout my every activities, showing me the way when I was lost, fixing me when I was wrong, putting up with my carelessness and showing me support across different social media.

To best serve the community, below I have tried my best to document what I have done, how I've done it and why I've done it for over the last three months. At the time of writing, some work is still in progress, so these also serve as a reference point for myself and others to reason about decisions in relevant topics.

The Main Story
1. Act One: Parallelization Utilities
2. Act Two: Lazy Wheels
3. Act Three: Late Downloading
4. Act Four: Batch Downloading in Parallel
The Plot Summary

The Main Story

The storyline can be divided into the following four main acts.

Act One: Parallelization Utilities

In this first act, I ensured the portibility of parallelization measures for later use in the final act. Multithreading and multiprocessing map were properly fellback on platforms without full support.

GH-8320: Add utilities for parallelization (close GH-8169)
GH-8538: Make utils.parallel tests tear down properly
GH-8504: Parallelize pip list --outdated and --uptodate (using GH-8320)

Act Two: Lazy Wheels

As proposed by @cosmicexplorer in GH-7819, it is possible to only download a portion of a wheel to obtain metadata during dependency resolution. Not only that this would reduce the total amount of data to be transmitted over the network in case the resolver needs to perform heavy backtracking, but also it would create a synchronization point at the end of the resolution progress where parallel downloading can be applied to the needed wheels (some wheels solely serve their metadata during dependency backtracking and are not needed by the users).

GH-8467: Add utitlity to lazily acquire wheel metadata over HTTP
GH-8584: Revise lazy wheel and its tests
GH-8681: Make range requests closer to chunk size (help GH-8670)
GH-8716 and GH-8730: Disable caching for range requests

Act Three: Late Downloading

During this act, the main works were refactoring to integrate the lazy wheel into pip's codebase and clean up the way for download parallelization.

GH-8411: Refactor operations.prepare.prepare_linked_requirement
GH-8629: Abstract away AbstractDistribution in higher-level resolver code
GH-8442, GH-8532 and GH-8588 (later reworked by @chrahunt in GH-8685): Use lazy wheel to obtain dependency information for the new resolver
GH-8743: Test hash checking for fast-deps
GH-8804: Check download directory before making range requests

Act Four: Batch Downloading in Parallel

The final act is mostly about the UI of the parallel download. My work involved around how the progress should be displayed and how other relevant information should be reported to the users.

GH-8710: Revise method fetching metadata using lazy wheels
GH-8722: Dedent late download logs (fix GH-8721)
GH-8737: Add a hook for batch downloading
GH-8771: Parallelize wheel download

The Side Quests

In order to keep the wheel turning (no pun intended) and avoid wasting time waiting for the pull requests above to be reviewed, I decided to create even more PRs (as I am typing this, many of the patches listed below are nowhere near being merged).

GH-7878: Fail early when install path is not writable
GH-7928: Fix rst syntax in Getting Started guide
GH-7988: Fix tabulate col size in case of empty cell
GH-8137: Add subcommand alias mechanism
GH-8143: Make mypy happy with beta release automation
GH-8248: Fix typo and simplify ireq call
GH-8332: Add license requirement to _vendor/README.rst
GH-8423: Nitpick logging calls
GH-8435: Use str.format style in logging calls
GH-8456: Lint src/pip/_vendor/README.rst
GH-8568: Declare constants in configuration.py as such
GH-8571: Clean up Configuration.unset_value and nit __init__
GH-8578: Allow verbose/quiet level to be specified via config files and environment variables
GH-8599: Replace tabs by spaces for consistency
GH-8614: Use monkeypatch.setenv to mock environment variables
GH-8674: Fix tests/functional/test_install_check.py, when run with new resolver
GH-8692: Make assertion failure give better message
GH-8709: List downloaded distributions before exiting (fix GH-8696)
GH-8759: Allow py2 deprecation warning from setuptools
GH-8766: Use the new resolver for test requirements
GH-8790: Mark tests using remote svn and hg as xfail
GH-8795: Reformat a few spots in user guide

The Plot Summary

Every Monday throughout the Summer of Code, I summarized what I had done in the week before in the form of either a short blog or an (even shorter) check-in. These write-ups often contain handfuls of popular culture references and was originally hosted on Python GSoC.

Reply via email

Outro

Mon, 31 Aug 2020 00:00:00 +0000

Outro

Steamed fish was amazing, matter of fact
Let me get some jerk chicken to go
Grabbed me one of them lemon pie theories
And let me get some of them benchmarks you theories too

The Look
The Benchmark
1. Average Distribution
2. Large Distribution
3. Distribution with Conflicting Dependencies
What Now?

The Look

At the time of writing, implementation-wise parallel download is ready:

Does this mean I've finished everything just-in-time? This sounds to good to be true! And how does it perform? Welp...

The Benchmark

Here comes the bad news: under a decent connection to the package index, using fast-deps does not make pip faster. For best comparison, I will time pip download on the following cases:

Average Distribution

For convenience purposes, let's refer to the commands to be used as follows

$ pip --no-cache-dir download {requirement}  # legacy-resolver
$ pip --use-feature=2020-resolver \
   --no-cache-dir download {requirement}  # 2020-resolver
$ pip --use-feature=2020-resolver --use-feature=fast-deps \
   --no-cache-dir download {requirement}  # fast-deps

In the first test, I used axuy and obtained the following results

legacy-resolver	2020-resolver	fast-deps
7.709s	7.888s	10.993s
7.068s	7.127s	11.103s
8.556s	6.972s	10.496s

Funny enough, running pip download with fast-deps in a directory with downloaded files already took around 7-8 seconds. This is because to lazily download a wheel, pip has to make many requests which are apparently more expensive than actual data transmission on my network.

When is it useful then?

With unstable connection to PyPI (for some reason I am not confident enough to state), this is what I got

2020-resolver	fast-deps
1m16.134s	0m54.894s
1m0.384s	0m40.753s
0m50.102s	0m41.988s

As the connection was unstable and that the majority of pip networking is performed as CI/CD with large and stable bandwidth, I am unsure what this result is supposed to tell (-;

Large Distribution

In this test, I used TensorFlow as the requirement and obtained the following figures:

legacy-resolver	2020-resolver	fast-deps
0m52.135s	0m58.809s	1m5.649s
0m50.641s	1m14.896s	1m28.168s
0m49.691s	1m5.633s	1m22.131s

Distribution with Conflicting Dependencies

Some requirement that will trigger a decent amount of backtracking by the current implementation of the new resolver oslo-utils==1.4.0:

2020-resolver	fast-deps
14.497s	24.010s
17.680s	28.884s
16.541s	26.333s

What Now?

I don't know, to be honest. At this point I'm feeling I've failed my own (and that of other stakeholders of pip) expectation and wasted the time and effort of pip's maintainers reviewing dozens of PRs I've made in the last three months.

On the bright side, this has been an opportunity for me to explore the codebase of package manager and discovered various edge cases where the new resolver has yet to cover (e.g. I've just noticed that pip download would save to-be-discarded distributions, I'll file an issue on that soon). Plus I got to know many new and cool people and idea, which make me a more helpful individual to work on Python packaging in the future, I hope.

Reply via email

Parallelizing Wheel Downloads

Mon, 17 Aug 2020 00:00:00 +0000

Parallelizing Wheel Downloads

And now it's clear as this promise
That we're making
Two progress bars into one

Hello there! It has been raining a lot lately and some mosquito has given me the Dengue fever today. To whoever reading this, I hope it would never happen to you.

Download Parallelization

I've been working on pip's download parallelization for quite a while now. As distribution download in pip was modeled as a lazily evaluated iterable of chunks, parallelizing such procedure is as simple as submitting routines that write files to disk to a worker pool.

Or at least that is what I thought.

Progress Reporting UI

pip is currently using customly defined progress reporting classes, which was not designed to working with multithreading code. Firstly, I want to try using these instead of defining separate UI for multithreaded progresses. As they use system signals for termination, one must the progress bars has to be running the main thread. Or sort of.

Since the progress bars are designed as iterators, I realized that we can call next on them. So quickly, I throw in some queues and locks, and prototyped the first working implementation of progress synchronization.

Performance Issues

Welp, I only said that it works, but I didn't mention the performance, which is terrible. I am pretty sure that the slow down is with the synchronization, since the map_multithread call doesn't seem to trigger anything that may introduce any sort of blocking.

This seems like a lot of fun, and I hope I'll get better tomorrow to continue playing with it!

Reply via email

Sorting Things Out

Mon, 03 Aug 2020 00:00:00 +0000

Sorting Things Out

Hi! I really hope that everyone reading this is still doing okay, and if that isn't the case, I wish you a good day!

`pip` 20.2 Released!

Last Wednesday, pip 20.2 was released, delivering the 2020-resolver as well as many other improvements! I was lucky to be able to get the fast-deps feature to be included as part of the release. A brief description of this experimental feature as well as testing instruction can be found on Python Discuss.

The public exposure of the feature also remind me of some further optimization to make on the lazy wheel. Hopefully without download parallelization it would not be too slow to put off testing by concerned users of pip.

Preparation for Download Parallelization

As of this moment, we already have:

Multithreading pool fallback working
An opt-in to use lazy wheel to optain dependency information, and thus getting a list of wheels at the end of resolution ready to be downloaded together

What's left is only to interject a parallel download somewhere after the dependency resolution step. Still, this struggles me way more than I've ever imagined. I got so stuck that I had to give myself a day off in the middle of the week (and study some Rust), then I came up with something what was agreed upon as difficult to maintain.

Indeed, a large part of this is my fault, for not communicating the design thoroughly with pip's maintainers and not carefully noting stuff down during (verbal) discussions with my mentor. Thankfully Chris Hunt came to the rescue and did a refactoring that will make my future work much easier and cleaner.

Reply via email

I've Walked 500 Miles…

Mon, 20 Jul 2020 00:00:00 +0000

I've Walked 500 Miles…

... and I would walk 500 more
Just to be the man who walks a thousand miles
To fall down at your door

The Main Road
The Side Quests
Snap Back to Reality

The Main Road

Hi, have you met fast-deps? It's (going to be) the name of pip's experimental feature that may improve the speed of dependency resolution of the new resolver. By avoid downloading whole wheels to just obtain metadata, it is especially helpful when pip has to do heavy backtracking to resolve conflicts.

Thanks to Chris Hunt's review on GH-8537, my mentor Pradyun Gedam and I worked out a less hacky approach to inteject the call to lazy wheel during the resolution process. A new PR GH-8588 was filed to implement it—I could have just worked on top of the old PR and rebased, but my git skill is far from gud enuff to confidently do it.

Testing this one has been a lot of fun though. At first, integration tests were added as a rerun of the tests for the new resolver, with an additional flag to use feature fast-deps. It indeed made me feel guilty towards Travis, who has to work around 30 minutes more every run. Per Chris Hunt's suggestion, in the new PR, I instead write a few functional tests for the area relating the most to the feature, namely pip's subcommands wheel, download and install.

It was also suggested that a mock server with HTTP range requests support might be better (in term of performance and reliablilty) than for testing. However, I have yet to be able to make Werkzeug do it.

Why did I say I'm half way there? With the parallel utilities merged and a way to quickly get the list of distribution to be downloaded being really close, what left is only to figure out a way to properly download them in parallel. With no distribution to be added during the download progress, the model of this will fit very well with the architecture in my original proposal. A batch downloader can be implemented to track the progress of each download and thus report them cleanly as e.g. progress bar or percentage. This is the part I am second-most excited about of my GSoC project this summer (after the synchronization of downloads written in my proposal, which was then superseded by fast-deps) and I can't wait to do it!

The Side Quests

As usual, I make sure that I complete every side quest I see during the journey:

GH-8568: Declare constants in configuration.py as such
GH-8571: Clean up Configuration.unset_value and nit the class' __init__
GH-8578: Allow verbose/quite level to be specified via config file and env var
GH-8599: Replace tabs by spaces for consistency

Snap Back to Reality

A bit about me, I actually walked 500 meters earlier today to a bank and walked 500 more to another to prepare my Visa card for purchasing the upcoming PinePhone prototype. It's one of the first smartphones to fully support a GNU/Linux distribution, where one can run desktop apps (including proper terminals) as well as traditional services like SSH, HTTP server and IPFS node because why not? Just a few hours ago, I pre-ordered the postmarketOS community edition with additional hardware for convergence.

If you did not come here for a PinePhone ad, please take my apologies though d-; and to ones reading this, I hope you all can become the person who walks a thousand miles to fall down at the door opening to all what you ever wished for!

Reply via email

I'm Not Drowning On My Own

Mon, 06 Jul 2020 00:00:00 +0000

I'm Not Drowning On My Own

Cold Water
Warm Water
Learning How To Swim
Diving Plan

Cold Water

Hello there! My schoolyear is coming to an end, with some final assignments and group projects left to be done. I for sure underestimated the workload of these and in the last (and probably next) few days I'm drowning in work trying to meet my deadlines.

One project that might be remotely relevant is cheese-shop, which tries to manage the metadata of packages from the real Cheese Shop. Other than that, schoolwork is draining a lot of my time and I can't remember the last time I came up with something new for my GSoC project )-;

Warm Water

On the bright side, I received a lot of help and encouragement from contributors and stakeholders of pip. In the last week alone, I had five pull requests merged:

GH-8332: Add license requirement to _vendor/README.rst
GH-8320: Add utilities for parallelization
GH-8504: Parallelize pip list --outdated and --uptodate
GH-8411: Refactor operations.prepare.prepare_linked_requirement
GH-8467: Add utitlity to lazily acquire wheel metadata over HTTP

In addition to helping me getting my PRs merged, my mentor Pradyun Gedam also gave me my first official feedback, including what I'm doing right (and wrong too!) and what I should keep doing to increase the chance of the project being successful.

GH-7819's roadmap (Danny McClanahan's discoveries and works on lazy wheels) is being closely tracked by hatch's maintainter Ofek Lev, which really makes me proud and warms my heart, that what I'm helping build is actually needed by the community!

Learning How To Swim

With GH-8467 and GH-8530 merged, I'm now working on GH-8532 which aims to roll out the lazy wheel as the way to obtain dependency information via the CLI flag --use-feature=lazy-wheel.

GH-8532 was failing initially, despite being relatively trivial and that the commit it used to base on was passing. Surprisingly, after rebasing it on top of GH-8530, it suddenly became green mysteriously. After the first (early) review, I was able to iterate on my earlier code, which used the ambiguous exception RuntimeError.

The rest to be done is just adding some functional tests (I'm pretty sure this will be either overwhelming or underwhelming) to make sure that the command-line flag is working correctly. Hopefully this can make it into the beta of the upcoming release this month.

In other news, I've also submitted a patch improving the tests for the parallelization utilities, which was really messy as I wrote them. Better late than never!

Metaphors aside, I actually can't swim d-:

Diving Plan

After GH-8532, I think I'll try to parallelize downloads of wheels that are lazily fetched only for metadata. By the current implementation of the new resolver, for pip install, this can be injected directly between the resolution and build/installation process.

Reply via email

Teredo Tunnel Simulation

Fri, 03 Jul 2020 00:00:00 +0000

Teredo Tunnel Simulation

Internet Protocol version 6 (IPv6), the most recent version of the Internet Protocol, was developed by the IETF to deal with the long-anticipated problem of IPv4 address exhaustion. Despite being superior to IPv4 in multiple aspect (e.g. larger address space, extension headers), IPv6 has not been widely adopted, although it has been semi-standardized in 1998 and fully-standardized in 2017.^[1]

During the transition period, teredo tunneling has been used to give IPv6 connectivity for IPv6-capable hosts that are on the IPv4 Internet but have no native connection to an IPv6 network.^[2] In this article, I will demontrate a way to set up such tunnel up on virtual machines, then examine the packets being sent by IPv6 nodes connected by the tunnel.

Configuration
1. Virtual Machines
2. Teredo Tunnel Setup
3. Teredo Tunnel Usage
Analysis
1. Packets Capturing
2. Packet Contents
  1. Ethernet Header
  2. IPv4 Header
  3. IPv6 Header
  4. ICMPv6
Conclusion

Configuration

Teredo Tunnel Simulation Speedrun

Virtual Machines

In order to simulate Teredo tunneling, one needs two IPv6 nodes and two routers with both IPv4 and IPv6 access. In total, there needs to be four virtual machines to be set up, thus I went for Void Linux, which is known for its low memory foot print thanks to using runit instead of systemd. To minimize resource usage and speed up the setup process, I chose the barebone live image which uses musl instead of glibc. At boot, the image uses only 40 MB of memory.

For virtualization, I used vert-manager, simply because it is available in Debian's repository (my host OS). For some reason, on amd64, the kernel refuses to boot until I give it over 200 MB, but apparently that is still a really modest number. Networking is provided to the guest OSes via NAT with default configurations.

It is worth mentioning that through virtio, one may use SSH to log into the guests systems from the host OS. I find this especially convenient as it enables me to copy and paste not only commands but also IP addresses between host and guests as well as between guests.

For convenience, from now on, the outside nodes will be referred to as PC A and PC B, on the other hand the routers are named Router A and Router B. Upon boot, they were given an Ethernet interface eth0 with the following addresses.

Node	MAC address	IPv4 address
Router A	`52:54:00:f0:85:c7`	`192.168.122.127`
Router B	`52:54:00:2b:01:cc`	`192.168.122.134`
PC A	`52:54:00:3b:82:36`	`192.168.122.86`
PC B	`52:54:00:7b:ed:c0`	`192.168.122.255`

Local IPv6 addresses were also given but we are not going to need them.

Teredo Tunnel Setup

First, I set up a IPv4 tunnel between the two routers:

# On Router A
ip tunnel add tunn mode sit remote 192.168.122.134 ttl 255
ip link set tunn up
# On Router B
ip tunnel add tunn mode sit remote 192.168.122.127 ttl 255
ip link set tunn up

For this tunnel to be able to act as a Teredo one, the two routers needs to have IPv6 addresses prefixed by 2001::/32.^[2]

# On Router A
ip -6 addr add 2001:2::1/64 dev eth0
# On Router B
ip -6 addr add 2001:3::1/64 dev eth0

Finally, I fellback all IPv6 lookups to the tunnel and enabled IPv6 forwarding:

ip -6 route add default dev tunn
sysctl -w net.ipv6.conf.all.forwarding=1

Teredo Tunnel Usage

The IPv6 addresses of the PCs were set up as follows (0x8067 is PC in ASCII).

# On PC A
ip -6 address add 2001:2::8067/64 dev eth0
# On PC B
ip -6 address add 2001:3::8067/64 dev eth0

By giving both Router A and PC A addresses prefixed by 2001:2::/64 (similarly for Router B and PC B), I implied that they can find each other through the local IPv6 network, for example on PC B:

$ ip -6 route | head -n1
2001:3::/64 dev eth0 proto kernel metric 256 pref medium

To use the newly created tunnel, the PCs simple had to be routed directly to the routers:

# On PC A
ip -6 route add default via 2001:2::1
# On PC B
ip -6 route add default via 2001:3::1

The connection could then be verified by running on PC A:

$ traceroute 2001:3::8067
traceroute to 2001:3::8067 (2001:3::8067), 30 hops max, 80 byte packets
 1  2001:2::1 (2001:2::1)  0.572 ms  0.441 ms  0.328 ms
 2  2001:3::1 (2001:3::1)  0.906 ms  0.888 ms  1.049 ms
 3  2001:3::8067 (2001:3::8067)  1.325 ms  1.174 ms  1.091 ms

Analysis

To gain further understanding on how packets are transferred over the Teredo tunnel, I captured and took a closer look at some of them.

Packets Capturing

Fortunately for me^[3], all traffic of guests OSes were wired to an separate interface named virbr0. To capture going through the tunnel, I simply had to tell Wireshark to listen to the interface, while letting PC A ping PC B though IPv6: ping -c1 2001:3::8067. I then skimmed through the packets sent between the two nodes and looked for the IPv6-in-IPv4 ones.

Packet Contents

Catured IPv6-in-IPv4 looks exactly like how I would imagined it to be. The content of the ping request can be partially decoded as follows.

Ethernet Header

52 54 00 2b 01 cc: MAC address of Router B (destination)
52 54 00 f0 85 c7: MAC address of Router A (source)
08 00: EtherType of IPv4

IPv4 Header

45 00 00 7c 9b 43 40 00 ff: Some flags
29: Protocol of IPv6
69 be: Checksum
c0 a8 7a 86: IPv4 address of Router B (destination)
c0 a8 7a 7f: IPv4 address of Router A (source)

IPv6 Header

60 00 07 e7 00 40: Some flags
3a: Next header (ICMPv6)
3f: Hop limit of 63
20 01 00 02 00 00 00 00 00 00 00 00 00 00 80 67: PC A's IPv6 address
20 01 00 03 00 00 00 00 00 00 00 00 00 00 80 67: PC B's IPv6 address

ICMPv6

80: Type of ping request
00 cf be 03 d9 00 01: Some flags
e3 0d fe 5e 00 00 00 00 bc d6 0e 00 00 00 00 00 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37: Binary data to be echoed

Conclusion

Via the activities elaborated above, the procedure to set up a Teredo tunnel and the content of the packets travelling through it could be well understood. These understanding may help facilite the adoption of IPv6, even for IPv6 nodes having no native connection to an IPv6 network. I hope that the IPv6 will grow fast enough that I can see the day measures like this tunnel can soon be deprecated.

[1]

RFC 8200

[2]

RFC 4380

[3]	Aside from web browsing, I also run an IPFS node and a bunch of local servers. I probably need to retire some of them soon since they really clutter the traffic.

Reply via email

The Wonderful Wizard of O'zip

Mon, 22 Jun 2020 00:00:00 +0000

The Wonderful Wizard of O'zip

Never give up... No one knows what's going to happen next.

Preface
The multiprocessing[.dummy] wrapper
The file-like object mapping ZIP over HTTP
What's next?

Preface

Greetings and best wishes! I had a lot of fun during the last week, although admittedly nothing was really finished. In summary, these are the works I carried out in the last seven days:

Finilizing utilities for parallelization
Continuing experimenting on using lazy wheels or dependency resolution
Polishing up the patch refactoring operations.prepare.prepare_linked_requirement
Adding flake8-logging-format to the linter
Splitting the linting patch from the PR adding the license requirement to vendor README

The `multiprocessing[.dummy]` wrapper

Yes, you read it right, this is the same section as last fortnight's blog. My mentor Pradyun Gedam gave me a green light to have GH-8411 merged without support for Python 2 and the non-lazy map variant, which turns out to be troublesome for multithreading.

The tests still needs to pass of course and the flaky tests (see failing tests over Azure Pipeline in the past) really gave me a panic attack earlier today. We probably need to mark them as xfail or investigate why they are undeterministic specifically on Azure, but the real reason I was all caught up and confused was that the unit tests I added mess with the cached imports and as pip's tests are run in parallel, who knows what it might affect. I was so relieved to not discover any new set of tests made flaky by ones I'm trying to add!

The file-like object mapping ZIP over HTTP

This is where the fun starts. Before we dive in, let's recall some background information on this. As discovered by Danny McClanahan in GH-7819, it is possible to only download a potion of a wheel and it's still valid for pip to get the distribution's metadata. In the same thread, Daniel Holth suggested that one may use HTTP range requests to specifically ask for the tail of the wheel, where the ZIP's central directory record as well as where usually dist-info (the directory containing METADATA) can be found.

Well, usually. While PEP 427 does indeed recommend

Archivers are encouraged to place the .dist-info files physically at the end of the archive. This enables some potentially interesting ZIP tricks including the ability to amend the metadata without rewriting the entire archive.

one of the mentioned tricks is adding shared libraries to wheels of extension modules (using e.g. auditwheel or delocate). Thus for non-pure Python wheels, it is unlikely that the metadata lie in the last few megabytes. Ignoring source distributions is bad enough, we can't afford making an optimization that doesn't work for extension modules, which are still an integral part of the Python ecosystem )-:

But hey, the ZIP's directory record is warrantied to be at the end of the file! Couldn't we do something about that? The short answer is yes. The long answer is, well, yessssssss! That, plus magic provided by most operating systems, this is what we figured out:

We can download a realatively small chunk at the end of the wheel until it is recognizable as a valid ZIP file.
In order for the end of the archive to actually appear as the end to zipfile, we feed to it an object with seek and read defined. As navigating to the rear of the file is performed by calling seek with relative offset and whence=SEEK_END (see man 3 fseek for more details), we are completely able to make the wheels in the cloud to behave as if it were available locally.
For large wheels, it is better to store them in hard disks instead of memory. For smaller ones, it is also preferable to store it as a file to avoid (error-prony and often not really efficient) manual tracking and joining of downloaded segments. We only use a small potion of the wheel, however just in case one is wonderring, we have very little control over when tempfile.SpooledTemporaryFile rolls over, so the memory-disk hybrid is not exactly working as expected.
With all these in mind, all we have to do is to define an intermediate object check for local availability and download if needed on calls to read, to lazily provide the data over HTTP and reduce execution time.

The only theoretical challenge left is to keep track of downloaded intervals, which I finally figured out after a few trials and errors. The code was submitted as a pull request to pip at GH-8467. A more modern (read: Python 3-only) variant was packaged and uploaded to PyPI under the name of lazip_. I am unaware of any use case for it outside of pip, but it's certainly fun to play with d-:

What's next?

I have been falling short of getting the PRs mention above merged for quite a while. With pip's next beta coming really soon, I have to somehow make the patches reach a certain standard and enough attention to be part of the pre-release—beta-testing would greatly help the success of the GSoC project. To other GSoC students and mentors reading this, I also hope your projects to turn out successful!

Reply via email

Unexpected Things When You're Expecting

Tue, 09 Jun 2020 00:00:00 +0000

Unexpected Things When You're Expecting

Hi everyone, I hope that you are all doing well and wishes you all good health! The last week has not been really kind to me with a decent amount of academic pressure (my school year is lasting until early Jully). It would be bold to say that I have spent 10 hours working on my GSoC project since the last check-in, let alone the 30 hours per week requirement. That being said, there were still some discoveries that I wish to share.

The multiprocessing[.dummy] wrapper
The change in direction

The `multiprocessing[.dummy]` wrapper

Most of the time I spent was to finalize the multi{processing,threading} wrapper for map function that submit tasks to the worker pool. To my surprise, it is rather difficult to write something that is not only portable but also easy to read and test.

By the latest commit, I realized the following:

The multiprocessing module was not designed for the implementation details to be abstracted away entirely. For example, the lazy map's could be really slow without specifying suitable chunk size (to cut the input iterable and distribute them to workers in the pool). By suitable, I mean only an order smaller than the input. This defeats half of the purpose of making it lazy: allowing the input to be evaluated lazily. Luckily, in the use case I'm aiming for, the length of the iterable argument is small and the laziness is only needed for the output (to pipeline download and installation).
Mocking import for testing purposes can never be pretty. One reason is that we (Python users) have very little control over the calls of import statements and its lower-level implementation __import__. In order to properly patch this built-in function, unlike for others of the same group, we have to monkeypatch the name from builtins (or __builtins__ under Python 2) instead of the module that import stuff. Furthermore, because of the special namespacing, to avoid infinite recursion we need to alias the function to a different name for fallback.
To add to the problem, multiprocessing lazily imports the fragile module during pools creation. Since the failure is platform-specific (the lack of sem_open), it was decided to check upon the import of the pip's module. Although the behavior is easier to reason in human language, testing it requires invalidating cached import and re-import the wrapper module.
Last but not least, I now understand the pain of keeping Python 2 compatibility that many package maintainers still need to deal with everyday (although Python 2 has reached its end-of-life, pip, for example, will still support it for another year).

The change in direction

Since last week, my mentor Pradyun Gedam and I set up weekly real-time meeting (a fancy term for video/audio chat in the worldwide quarantine era) for the entire GSoC period. During the last session, we decided to put parallelization of download during resolution on hold, in favor of a more beneficial goal: partially download the wheels during dependency resolution.

As discussed by Danny McClanahan and the maintainers of pip, it is feasible to only download a few kB of a wheel to obtain enough metadata for the resolution of dependency. While this is only applicable to wheels (i.e. prebuilt packages), other packaging format only make up less than 20% of the downloads (at least on PyPI), and the figure is much less for the most popular packages. Therefore, this optimization alone could make the upcoming backtracking resolver's performance par with the legacy one.

During the last few years, there has been a lot of effort being poured into replacing pip's current resolver that is unable to resolve conflicts. While its correctness will be ensured by some of the most talented and hard-working developers in the Python packaging community, from the users' point of view, it would be better to have its performance not lagging behind the old one. Aside from the increase in CPU cycles for more rigorous resolution, more I/O, especially networking operations is expected to be performed. This is due to the lack of a standard and efficient way to acquire the metadata. Therefore, unlike most package managers we are familiar with, pip has to fetch (and possibly build) the packages solely for dependency informations.

Fortunately, PEP 427#recommended-archiver-features recommends package builders to place the metadata at the end of the archive. This allows the resolver to only fetch the last few kB using HTTP range requests_ for the relevant information. Simply appending Range: bytes=-8000 to the request header in pip._internal.network.download makes the resolution process lightning fast. Of course this breaks the installation but I am confident that it is not difficult to implement this optimization cleanly.

One drawback of this optimization is the compatibility. Not every Python package index support range requests, and it is not possible to verify the partial wheel. While the first case is unavoidable, for the other, hashes checking is usually used for pinned/locked-version requirements, thus no backtracking is done during dependency resolution.

Either way, before installation, the packages selected by the resolver can be downloaded in parallel. This warranties a larger crowd of packages, compared to parallelization during resolution, where the number of downloads can be as low as one during trail of different versions of the same package.

Unfortunately, I have not been able to do much other than a minor clean up. I am looking forward to accomplishing more this week and seeing what this path will lead us too! At the moment, I am happy that I'm able to meet the blog deadline, at least in UTC!

Reply via email

System Cascade Connection

Wed, 15 Apr 2020 00:00:00 +0000

System Cascade Connection

Given two discrete-time systems $A$ and $B$ connected in cascade to form a new system $C = x \mapsto B(A(x))$, we examine the following properties:

Linearity
Time Invariance
LTI Ordering
Causality
BIBO Stability

Linearity

If $A$ and $B$ are linear, i.e. for all signals $x_i$ and scalars $a_i$,

\[\begin{aligned} A\left(n \mapsto \sum_i a_i x_i[n]\right) = n \mapsto \sum_i a_i A(x_i)[n]\\ B\left(n \mapsto \sum_i a_i x_i[n]\right) = n \mapsto \sum_i a_i B(x_i)[n] \end{aligned}\]

then $C$ is also linear

\[\begin{aligned} C\left(n \mapsto \sum_i a_i x_i[n]\right) &= B\left(A\left(n \mapsto \sum_i a_i x_i[n]\right)\right)\\ &= B\left(n \mapsto \sum_i a_i A(x_i)[n]\right)\\ &= n \mapsto \sum_i a_i B(A(x_i))[n]\\ &= n \mapsto \sum_i a_i C(x_i)[n] \end{aligned}\]

Time Invariance

If $A$ and $B$ are time invariant, i.e. for all signals $x$ and integers $k$,

\[\begin{aligned} A(n \mapsto x[n - k]) &= n \mapsto A(x)[n - k]\\ B(n \mapsto x[n - k]) &= n \mapsto B(x)[n - k] \end{aligned}\]

then $C$ is also time invariant

\[\begin{aligned} C(n \mapsto x[n - k]) &= B(A(n \mapsto x[n - k]))\\ &= B(n \mapsto A(x)[n - k])\\ &= n \mapsto B(A(x))[n - k]\\ &= n \mapsto C(x)[n - k] \end{aligned}\]

LTI Ordering

If $A$ and $B$ are linear and time-invariant, there exists signals $g$ and $h$ such that for all signals $x$, $A = x \mapsto x * g$ and $B = x \mapsto x * h$, thus

\[B(A(x)) = B(x * g) = x * g * h = x * h * g = A(x * h) = A(B(x))\]

or interchanging $A$ and $B$ order does not change $C$.

Causality

If $A$ and $B$ are causal, i.e. for all signals $x$, $y$ and any choise of integer $k$,

\[\begin{aligned} \forall n < k, x[n] = y[n]\quad \Longrightarrow &\;\begin{cases} \forall n < k, A(x)[n] = A(y)[n]\\ \forall n < k, B(x)[n] = B(y)[n] \end{cases}\\ \Longrightarrow &\;\forall n < k, B(A(x))[n] = B(A(y))[n]\\ \Longleftrightarrow &\;\forall n < k, C(x)[n] = C(y)[n] \end{aligned}\]

then $C$ is also causal.

BIBO Stability

If $A$ and $B$ are stable, i.e. there exists a signal $x$ and scalars $a$ and $b$ that for all integers $n$,

\[\begin{aligned} |x[n]| < a &\Longrightarrow |A(x)[n]| < b\\ |x[n]| < a &\Longrightarrow |B(x)[n]| < b \end{aligned}\]

then $C$ is also stable, i.e. there exists a signal $x$ and scalars $a$, $b$ and $c$ that for all integers $n$,

\[\begin{aligned} |x[n]| < a\quad \Longrightarrow &\;|A(x)[n]| < b\\ \Longrightarrow &\;|B(A(x))[n]| < c\\ \Longleftrightarrow &\;|C(x)[n]| < c \end{aligned}\] Reply via email

Infinite Sequences: A Case Study in Functional Python

Thu, 28 Feb 2019 00:00:00 +0000

Infinite Sequences: A Case Study in Functional Python

In this article, we will only consider sequences defined by a function whose domain is a subset of the set of all integers. Such sequences will be visualized, i.e. we will try to evaluate the first few (thousand) elements, using functional programming paradigm, where functions are more similar to the ones in math (in contrast to imperative style with side effects confusing to inexperenced coders). The idea is taken from subsection 3.5.2 of SICP and adapted to Python, which, compare to Scheme, is significantly more popular: Python is pre-installed on almost every modern Unix-like system, namely macOS, GNU/Linux and the *BSDs; and even at MIT, the new 6.01 in Python has recently replaced the legendary 6.001 (SICP).

One notable advantage of using Python is its huge standard library. For example the identity sequence (sequence defined by the identity function) can be imported directly from itertools:

>>> from itertools import count
>>> positive_integers = count(start=1)
>>> next(positive_integers)
1
>>> next(positive_integers)
2
>>> for _ in range(4): next(positive_integers)
... 
3
4
5
6

To open a Python emulator, simply lauch your terminal and run python. If that is somehow still too struggling, navigate to the interactive shell on Python.org.

Let's get it started with somethings everyone hates: recursively defined sequences, e.g. the famous Fibonacci ($F_n = F_{n-1} + F_{n-2}$, $F_1 = 1$ and $F_0 = 0$). Since Python does not support tail recursion, it's generally not a good idea to define anything recursively (which is, ironically, the only trivial functional solution in this case) but since we will only evaluate the first few terms (use the Tab key to indent the line when needed):

>>> def fibonacci(n, a=0, b=1):
...     # To avoid making the code look complicated,
...     # n < 0 is not handled here.
...     return a if n == 0 else fibonacci(n - 1, b, a + b)
... 
>>> fibo_seq = (fibonacci(n) for n in count(start=0))
>>> for _ in range(7): next(fibo_seq)
... 
0
1
1
2
3
5
8

Note

The fibo_seq above is just to demonstrate how itertools.count can be use to create an infinite sequence defined by a function. For better performance, the following should be used instead:

def fibonacci_sequence(a=0, b=1):
    yield a
    yield from fibonacci_sequence(b, a+b)

It is noticable that the elements having been iterated through (using next) will disappear forever in the void (oh no!), but that is the cost we are willing to pay to save some memory, especially when we need to evaluate a member of (arbitrarily) large index to estimate the sequence's limit. One case in point is estimating a definite integral using left Riemann sum.

def integral(f, a, b):
    def left_riemann_sum(n):
        dx = (b-a) / n
        def x(i): return a + i*dx
        return sum(f(x(i)) for i in range(n)) * dx
    return left_riemann_sum

The function integral(f, a, b) as defined above returns a function taking $n$ as an argument. As $n\to\infty$, its result approaches $\int_a^b f(x)\mathrm d x$. For example, we are going to estimate $\pi$ as the area of a semicircle whose radius is $\sqrt 2$:

>>> from math import sqrt
>>> def semicircle(x): return sqrt(abs(2 - x*x))
... 
>>> pi = integral(semicircle, -sqrt(2), sqrt(2))
>>> pi_seq = (pi(n) for n in count(start=2))
>>> for _ in range(3): next(pi_seq)
... 
2.000000029802323
2.514157464087051
2.7320508224700384

Whilst the first few aren't quite close, at index around 1000, the result is somewhat acceptable:

3.1414873191059525
3.1414874770617427
3.1414876346231577

Since we are comfortable with sequence of sums, let's move on to sums of a sequence, which are called series. For estimation, again, we are going to make use of infinite sequences of partial sums, which are implemented as itertools.accumulate by thoughtful Python developers. Geometric and p-series can be defined as follow:

from itertools import accumulate as partial_sumsdef geometric_series(r, a=1):
    return partial_sums(a*r**n for n in count(0))def p_series(p):
    return partial_sums(1 / n**p for n in count(1))

We can then use these to determine whether a series is convergent or divergent. For instance, one can easily verify that the $p$-series with $p = 2$ converges to $\pi^2 / 6 \approx 1.6449340668482264$ via

>>> s = p_series(p=2)
>>> for _ in range(11): next(s)
... 
1.0
1.25
1.3611111111111112
1.4236111111111112
1.4636111111111112
1.4913888888888889
1.511797052154195
1.527422052154195
1.5397677311665408
1.5497677311665408
1.558032193976458

We can observe that it takes quite a lot of steps to get the precision we would generally expect ($s_{11}$ is only precise to the first decimal place; second decimal places: $s_{101}$; third: $s_{2304}$). Luckily, many techniques for series acceleration are available. Shanks transformation for instance, can be implemented as follow:

from itertools import islice, teedef shanks(seq):
    return map(lambda x, y, z: (x*z - y*y) / (x + z - y*2),
               *(islice(t, i, None) for i, t in enumerate(tee(seq, 3))))

In the code above, lambda x, y, z: (x*z - y*y) / (x + z - y*2) denotes the anonymous function $(x, y, z) \mapsto \frac{xz - y^2}{x + z - 2y}$ and map is a higher order function applying that function to respective elements of subsequences starting from index 1, 2 and 3 of seq. On Python 2, one should import imap from itertools to get the same lazy behavior of map on Python 3.

>>> s = shanks(p_series(2))
>>> for _ in range(10): next(s)
... 
1.4500000000000002
1.503968253968257
1.53472222222223
1.5545202020202133
1.5683119658120213
1.57846371882088
1.5862455815659202
1.5923993101138652
1.5973867787856946
1.6015104548459742

The result was quite satisfying, yet we can do one step futher by continuously applying the transformation to the sequence:

>>> def compose(transform, seq):
... 	yield next(seq)
... 	yield from compose(transform, transform(seq))
... 
>>> s = compose(shanks, p_series(2))
>>> for _ in range(10): next(s)
... 
1.0
1.503968253968257
1.5999812811165188
1.6284732442271674
1.6384666832276524
1.642311342667821
1.6425249569252578
1.640277484549416
1.6415443295058203
1.642038043478661

Shanks transformation works on every sequence (not just sequences of partial sums). Back to previous example of using left Riemann sum to compute definite integral:

>>> pi_seq = compose(shanks, map(pi, count(2)))
>>> for _ in range(10): next(pi_seq)
... 
2.000000029802323
2.978391111182236
3.105916845397819
3.1323116570377185
3.1389379264270736
3.140788413965646
3.140921512857936
3.1400282163913436
3.1400874774021816
3.1407097229603256
>>> next(islice(pi_seq, 300, None))
3.1415061302492413

Now having series defined, let's see if we can learn anything about power series. Sequence of partial sums of power series $\sum c_n (x - a)^n$ can be defined as

from operator import muldef power_series(c, start=0, a=0):
    return lambda x: partial_sums(map(mul, c, (x**n for n in count(start))))

We can use this to compute functions that can be written as Taylor series:

from math import factorial
def exp(x):
    return power_series(1/factorial(n) for n in count(0))(x)def cos(x):
    c = ((1 - n%2) * (1 - n%4) / factorial(n) for n in count(0))
    return power_series(c)(x)def sin(x):
    c = (n%2 * (2 - n%4) / factorial(n) for n in count(1))
    return power_series(c, start=1)(x)

Amazing! Let's test 'em!

>>> e = compose(shanks, exp(1)) # this should converges to 2.718281828459045
>>> for _ in range(4): next(e)
... 
1.0
2.749999999999996
2.718276515152136
2.718281825486623

Impressive, huh? For sine and cosine, series acceleration is not even necessary:

>>> from math import pi as PI
>>> s = sin(PI/6)
>>> for _ in range(5): next(s)
... 
0.5235987755982988
0.5235987755982988
0.49967417939436376
0.49967417939436376
0.5000021325887924
>>> next(islice(cos(PI/3), 8, None))
0.500000433432915

Reply via email