<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Arnaud's blog]]></title><description><![CDATA[Arnaud's blog]]></description><link>https://blog.angelside.net</link><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 09:29:25 GMT</lastBuildDate><atom:link href="https://blog.angelside.net/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Trump is an aggressor, plain and simple]]></title><description><![CDATA[While the snowflakes peacefully float down in front of my window, the world is again in turmoil. Just today, I finished writing a technical article that I worked on since several days, but it was overshadowed with a bitter taste due to latest events....]]></description><link>https://blog.angelside.net/trump-is-an-aggressor-plain-and-simple</link><guid isPermaLink="true">https://blog.angelside.net/trump-is-an-aggressor-plain-and-simple</guid><category><![CDATA[Trump]]></category><category><![CDATA[Venezuela]]></category><category><![CDATA[politics]]></category><dc:creator><![CDATA[Arnaud Dagnelies]]></dc:creator><pubDate>Tue, 06 Jan 2026 17:26:00 GMT</pubDate><content:encoded><![CDATA[<p>While the snowflakes peacefully float down in front of my window, the world is again in turmoil. Just today, I finished writing a technical article that I worked on since several days, but it was overshadowed with a bitter taste due to latest events.</p>
<p>To punt it bluntly, after stealing oil tankers like pirates and bombing ships, the latest stunt of Trump was basically “Let’s kidnap the president to better plunder Venezuela’s oil”! Of course, he won’t call it like that. It’s “for freedom” and to “fight drugs”. But heck, do it in your own country! This is a military putsch in another country, and has nothing remotely to do with democracy. Venezuela’s has the largest oil deposits in the world and Trumps is eying the gas pump.</p>
<p>Sadly, the whole world only watches, criticizing it but too weak to oppose the bullying of its American big bro. I’m kind of ashamed that Europe and in particular Germany had such a low key response regarding this aggression. It’s like looking away to preserve one’s own self interests. They probably also don’t want to get on Trump’s nerves because they depend on US, including their help in Ukraine. I just hope this greed and madness won’t escalate further.</p>
<p>I also noticed during the writing of the technical article today, that Europe is extremely dependent on US. Not in the usual sense because of trading, supply chains, military and the other usual metrics, but “technologically”. Everyone here uses Windows, Google, Apple, Amazon, ChatGPT, GitHub, WhatsApp, US Cloud infrastructure and services. Pull the plug and nothing works anymore in Europe. It’s not like China/Russia who have strategic alternatives for almost everything, from their Search Engines, Chat Apps to Alibaba Cloud and such. Pull the plug there and they’ll adapt after some disruption. That is a geopolitical dependency that is kind of never considered geopolitically despite it appears to me as the huge showstopper. We are great allies all the time after all!</p>
<p>Now I may sound paranoid, but I think that in this new quickly evolving era, it’s time for Europe to become more technologically independent, to not suffer the whims of its American big bro. Sure, we are “allies” and I think we will always be. But this technological dependence is also a way to exert pressure and control, whether it’s “America first” tariff negotiations or to coerce Europe in some way. And who knows how it will be in ten or twenty years from now. If we don’t want to get the shorter end of the stick, I think it makes sense for Europe to also invest and use other European technological alternatives. I always considered myself as a “world citizen” so it saddens to speak like that, truly. I also thought the era of peace and reason was upon us …but somehow the political world seems to be going nuts.</p>
<p>Let’s hope for the best, peace and love!</p>
]]></content:encoded></item><item><title><![CDATA[Cloudflare Workers performance: an experiment with Astro and worldwide latencies]]></title><description><![CDATA[Why use Cloudflare Workers?
Cloudflare Workers let you host pages and run code without managing servers. Unlike traditional servers placed in a single or a few locations, the deployed static assets and code are mirrored around the globe in the data c...]]></description><link>https://blog.angelside.net/cloudflare-workers-performance-an-experiment-with-astro-and-worldwide-latencies</link><guid isPermaLink="true">https://blog.angelside.net/cloudflare-workers-performance-an-experiment-with-astro-and-worldwide-latencies</guid><category><![CDATA[cloudflare-worker]]></category><category><![CDATA[edge computing]]></category><category><![CDATA[Web Development]]></category><dc:creator><![CDATA[Arnaud Dagnelies]]></dc:creator><pubDate>Tue, 06 Jan 2026 14:45:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767711723825/50347090-c85a-498d-bac9-280d9833a3b3.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-why-use-cloudflare-workers">Why use Cloudflare Workers?</h2>
<p><a target="_blank" href="https://workers.cloudflare.com/">Cloudflare Workers</a> let you host pages and run code without managing servers. Unlike traditional servers placed in a single or a few locations, the deployed static assets and code are mirrored around the globe in the data centers shown as blue dots below. Naturally, this offers better latencies, scalability and robustness.</p>
<p><img src="https://www.cloudflare.com/network-maps/cloudflare-pops-2O04nulSdNrRpJR9fs9OKv.svg" alt="network map svg" /></p>
<p>Their developer platform also extends beyond “Workers” (the compute part) and include storage, databases, queues, AI and lots of other developer tooling. The whole with a generous free tier and reasonable pricing beyond that.</p>
<p>Why am I writing this? I find it fairly good, had a good experience with it, and that’s why I will present it here. This article is not sponsored in any way. I just think it’s somehow a responsibility of developers to communicate about the tools they use in order to keep their ecosystem lively. I’ve seen too much good stuff getting abandoned because there was no “buzz”.</p>
<p>The benefits of using Cloudflare Workers is:</p>
<ul>
<li><p>Great latencies worldwide</p>
</li>
<li><p>Unlimited scalability</p>
</li>
<li><p>No servers to take care of</p>
</li>
<li><p>Further tooling for data, files, AI, etc.</p>
</li>
<li><p>GitHub pull requests preview URLs</p>
</li>
<li><p>Free tier good enough for most hobby projects</p>
</li>
</ul>
<h2 id="heading-when-not-to-use-it">When not to use it</h2>
<p>Like every tool, it has use cases for which it shines and others it is not suited for. This is important to grasp and understanding the <a target="_blank" href="https://developers.cloudflare.com/workers/reference/how-workers-works/">underlying technology</a> helps tremendously. Basically, in loads your whole app bundled as a script and evaluates it on the fly. It’s fast and works wonderfully if your API and used frameworks are slim and minimalistic. However, it would be ill-advised in following use cases:</p>
<ul>
<li><p><strong><em>Large complex apps</em></strong><br />  The cost of evaluating your API / SSR script will grow as your app grows. The larger it becomes, the more inefficient its invocation as a whole will become. There are also some <a target="_blank" href="https://developers.cloudflare.com/workers/platform/limits/#worker-size">limits</a> how large your “script” can be. Although it has been raised multiple times in the past, the fact that this is extremely inefficient will always remain. Thus, be careful when picking dependencies/frameworks since they can quickly bloat your codebase.</p>
</li>
<li><p><strong><em>Heavy resource consumption</em></strong><br />  Due to its nature, it is not suited to compute stuff requiring large amounts of CPU/RAM/time like statistic models or scientific computation. Large caches are problematic too. Waiting for long-running async server-side requests is OK though, the execution is suspended in-between and do not count towards execution time.</p>
</li>
<li><p><strong><em>Long-lived connections</em></strong></p>
<p>  That’s also problematic. You should rather use polling than keeping connections open.</p>
</li>
</ul>
<blockquote>
<p>In other words: “The slimmer, the better!”</p>
</blockquote>
<p>It’s kind of difficult to say what’s small enough and when it becomes too large. This is rather suited for small self-contained microservices of modest size. Even debugging using breakpoint might turn out challenging. For such larger applications, traditional server deployments would be more suited.</p>
<h2 id="heading-what-will-we-build">What will we build?</h2>
<p>A “<a target="_blank" href="https://quoted.day/">Quote of the Day</a>” Web application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765199463823/b5331dcc-2ecf-4335-b300-76780ce836fb.png" alt class="image--center mx-auto" /></p>
<p>The purpose is not to build something big, but rather a simple proof-of-concept. The quotes will be stored in a KV store and fetched Client-side. That way, we can measure how fast the whole works and if it lives up to the expectations.</p>
<p>The default version of <a target="_blank" href="https://quoted.day">https://quoted.day</a> is available in two flavours:</p>
<ul>
<li><p><a target="_blank" href="https://quoted.day/spa">https://quoted.day/spa</a>: a static page, fetching the quote text/author asynchronously</p>
</li>
<li><p><a target="_blank" href="https://quoted.day/ssr">https://quoted.day/ssr</a>: Server-Side-Rendering, rendering the page with the quote on the server</p>
</li>
</ul>
<p>I swapped which one is the default from time to time to perform experiments. Performance (latency) may vary depending where you are located and whether what you fetch is “hot” or “cold”. Before we delve into details on how to build such an app, let’s take a look at the performance we can expect.</p>
<h2 id="heading-benchmarking-latencies-worldwide">Benchmarking latencies worldwide</h2>
<p>Unlike the <em>internal</em> Cloudflare latency measures, measured “inside” the worker and therefore quite optimistic, we will look at the “real” <em>external</em> latency thanks to the great tool <a target="_blank" href="https://www.openstatus.dev/play/checker">https://www.openstatus.dev/play/checker</a> .</p>
<p>Thanks to that, we can obtain a pretty good idea of the overall latencies that can be observed all over the world. Note however that Australia, Asia and Africa may have rather erratic latencies that “jump” sometimes.</p>
<p>We will also benchmark multiple things separately:</p>
<ul>
<li><p>Static assets</p>
</li>
<li><p>Stateless functions</p>
</li>
<li><p>Hot KV read</p>
</li>
<li><p>Cold KV read</p>
</li>
<li><p>KV writes</p>
</li>
</ul>
<p>Also, every case will get “two passes”, to hopefully fill caches on the way, and only record the second one.</p>
<h3 id="heading-static-assets"><em>Static assets</em></h3>
<p>This was obtained by fetch the main page at <a target="_blank" href="https://quoted.day/spa">https://quoted.day/spa</a></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Region</strong></td><td><strong>Latency</strong></td></tr>
</thead>
<tbody>
<tr>
<td>🇩🇪 fra Frankfurt, Germany</td><td>30ms</td></tr>
<tr>
<td>🇩🇪 koyeb_fra Frankfurt, Germany</td><td>31ms</td></tr>
<tr>
<td>🇫🇷 cdg Paris, France</td><td>33ms</td></tr>
<tr>
<td>🇳🇱 railway_europe-west4-drams3a Amsterdam, Netherlands</td><td>33ms</td></tr>
<tr>
<td>🇬🇧 lhr London, United Kingdom</td><td>31ms</td></tr>
<tr>
<td>🇸🇪 arn Stockholm, Sweden</td><td>32ms</td></tr>
<tr>
<td>🇫🇷 koyeb_par Paris, France</td><td>31ms</td></tr>
<tr>
<td>🇳🇱 ams Amsterdam, Netherlands</td><td>54ms</td></tr>
<tr>
<td>🇺🇸 ewr Secaucus, New Jersey, USA</td><td>32ms</td></tr>
<tr>
<td>🇺🇸 iad Ashburn, Virginia, USA</td><td>36ms</td></tr>
<tr>
<td>🇺🇸 koyeb_was Washington, USA</td><td>35ms</td></tr>
<tr>
<td>🇨🇦 yyz Toronto, Canada</td><td>50ms</td></tr>
<tr>
<td>🇺🇸 ord Chicago, Illinois, USA</td><td>36ms</td></tr>
<tr>
<td>🇺🇸 lax Los Angeles, California, USA</td><td>28ms</td></tr>
<tr>
<td>🇺🇸 sjc San Jose, California, USA</td><td>26ms</td></tr>
<tr>
<td>🇺🇸 railway_us-east4-eqdc4a Virginia, USA</td><td>41ms</td></tr>
<tr>
<td>🇺🇸 railway_us-west2 California, USA</td><td>49ms</td></tr>
<tr>
<td>🇺🇸 koyeb_sfo San Francisco, USA</td><td>29ms</td></tr>
<tr>
<td>🇸🇬 railway_asia-southeast1-eqsg3a Singapore, Singapore</td><td>53ms</td></tr>
<tr>
<td>🇮🇳 bom Mumbai, India</td><td>95ms</td></tr>
<tr>
<td>🇺🇸 dfw Dallas, Texas, USA</td><td>30ms</td></tr>
<tr>
<td>🇯🇵 nrt Tokyo, Japan</td><td>28ms</td></tr>
<tr>
<td>🇦🇺 syd Sydney, Australia</td><td>31ms</td></tr>
<tr>
<td>🇸🇬 sin Singapore, Singapore</td><td>294ms</td></tr>
<tr>
<td>🇸🇬 koyeb_sin Singapore, Singapore</td><td>436ms</td></tr>
<tr>
<td>🇧🇷 gru Sao Paulo, Brazil</td><td>252ms</td></tr>
<tr>
<td>🇿🇦 jnb Johannesburg, South Africa</td><td>559ms</td></tr>
<tr>
<td>🇯🇵 koyeb_tyo Tokyo, Japan</td><td>28ms</td></tr>
</tbody>
</table>
</div><h3 id="heading-stateless-function"><em>Stateless function</em></h3>
<p>his is obtained by fetching the endpoint <a target="_blank" href="https://quoted.day/api/time">https://quoted.day/api/time</a> which simply returns the current time.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Region</strong></td><td><strong>Latency</strong></td></tr>
</thead>
<tbody>
<tr>
<td>🇬🇧 lhr London, United Kingdom</td><td>38ms</td></tr>
<tr>
<td>🇩🇪 koyeb_fra Frankfurt, Germany</td><td>32ms</td></tr>
<tr>
<td>🇳🇱 railway_europe-west4-drams3a Amsterdam, Netherlands</td><td>36ms</td></tr>
<tr>
<td>🇫🇷 cdg Paris, France</td><td>75ms</td></tr>
<tr>
<td>🇳🇱 ams Amsterdam, Netherlands</td><td>76ms</td></tr>
<tr>
<td>🇩🇪 fra Frankfurt, Germany</td><td>88ms</td></tr>
<tr>
<td>🇫🇷 koyeb_par Paris, France</td><td>73ms</td></tr>
<tr>
<td>🇸🇪 arn Stockholm, Sweden</td><td>97ms</td></tr>
<tr>
<td>🇺🇸 railway_us-east4-eqdc4a Virginia, USA</td><td>36ms</td></tr>
<tr>
<td>🇺🇸 koyeb_was Washington, USA</td><td>62ms</td></tr>
<tr>
<td>🇺🇸 ewr Secaucus, New Jersey, USA</td><td>95ms</td></tr>
<tr>
<td>🇺🇸 lax Los Angeles, California, USA</td><td>39ms</td></tr>
<tr>
<td>🇺🇸 sjc San Jose, California, USA</td><td>25ms</td></tr>
<tr>
<td>🇺🇸 iad Ashburn, Virginia, USA</td><td>92ms</td></tr>
<tr>
<td>🇺🇸 dfw Dallas, Texas, USA</td><td>90ms</td></tr>
<tr>
<td>🇨🇦 yyz Toronto, Canada</td><td>22ms</td></tr>
<tr>
<td>🇺🇸 ord Chicago, Illinois, USA</td><td>108ms</td></tr>
<tr>
<td>🇮🇳 bom Mumbai, India</td><td>99ms</td></tr>
<tr>
<td>🇸🇬 railway_asia-southeast1-eqsg3a Singapore, Singapore</td><td>45ms</td></tr>
<tr>
<td>🇯🇵 nrt Tokyo, Japan</td><td>27ms</td></tr>
<tr>
<td>🇺🇸 railway_us-west2 California, USA</td><td>99ms</td></tr>
<tr>
<td>🇧🇷 gru Sao Paulo, Brazil</td><td>89ms</td></tr>
<tr>
<td>🇦🇺 syd Sydney, Australia</td><td>26ms</td></tr>
<tr>
<td>🇸🇬 sin Singapore, Singapore</td><td>220ms</td></tr>
<tr>
<td>🇺🇸 koyeb_sfo San Francisco, USA</td><td>26ms</td></tr>
<tr>
<td>🇿🇦 jnb Johannesburg, South Africa</td><td>540ms</td></tr>
<tr>
<td>🇸🇬 koyeb_sin Singapore, Singapore</td><td>354ms</td></tr>
<tr>
<td>🇯🇵 koyeb_tyo Tokyo, Japan</td><td>71ms</td></tr>
</tbody>
</table>
</div><h3 id="heading-hot-kv-read"><em>Hot KV read</em></h3>
<p>This is obtained by fetching a fixed quote from the KV store using the endpoint <a target="_blank" href="https://quoted.day/api/quote/123">https://quoted.day/api/quote/123</a></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Region</strong></td><td><strong>Latency</strong></td></tr>
</thead>
<tbody>
<tr>
<td>🇬🇧 lhr London, United Kingdom</td><td>34ms</td></tr>
<tr>
<td>🇫🇷 cdg Paris, France</td><td>39ms</td></tr>
<tr>
<td>🇳🇱 railway_europe-west4-drams3a Amsterdam, Netherlands</td><td>35ms</td></tr>
<tr>
<td>🇫🇷 koyeb_par Paris, France</td><td>37ms</td></tr>
<tr>
<td>🇸🇪 arn Stockholm, Sweden</td><td>34ms</td></tr>
<tr>
<td>🇳🇱 ams Amsterdam, Netherlands</td><td>77ms</td></tr>
<tr>
<td>🇩🇪 koyeb_fra Frankfurt, Germany</td><td>103ms</td></tr>
<tr>
<td>🇨🇦 yyz Toronto, Canada</td><td>25ms</td></tr>
<tr>
<td>🇺🇸 dfw Dallas, Texas, USA</td><td>33ms</td></tr>
<tr>
<td>🇺🇸 koyeb_was Washington, USA</td><td>55ms</td></tr>
<tr>
<td>🇩🇪 fra Frankfurt, Germany</td><td>168ms</td></tr>
<tr>
<td>🇺🇸 iad Ashburn, Virginia, USA</td><td>106ms</td></tr>
<tr>
<td>🇺🇸 railway_us-west2 California, USA</td><td>52ms</td></tr>
<tr>
<td>🇺🇸 ewr Secaucus, New Jersey, USA</td><td>122ms</td></tr>
<tr>
<td>🇺🇸 koyeb_sfo San Francisco, USA</td><td>33ms</td></tr>
<tr>
<td>🇺🇸 railway_us-east4-eqdc4a Virginia, USA</td><td>123ms</td></tr>
<tr>
<td>🇿🇦 jnb Johannesburg, South Africa</td><td>43ms</td></tr>
<tr>
<td>🇮🇳 bom Mumbai, India</td><td>99ms</td></tr>
<tr>
<td>🇸🇬 railway_asia-southeast1-eqsg3a Singapore, Singapore</td><td>88ms</td></tr>
<tr>
<td>🇺🇸 ord Chicago, Illinois, USA</td><td>69ms</td></tr>
<tr>
<td>🇧🇷 gru Sao Paulo, Brazil</td><td>99ms</td></tr>
<tr>
<td>🇺🇸 sjc San Jose, California, USA</td><td>40ms</td></tr>
<tr>
<td>🇦🇺 syd Sydney, Australia</td><td>64ms</td></tr>
<tr>
<td>🇺🇸 lax Los Angeles, California, USA</td><td>91ms</td></tr>
<tr>
<td>🇸🇬 sin Singapore, Singapore</td><td>345ms</td></tr>
<tr>
<td>🇯🇵 nrt Tokyo, Japan</td><td>126ms</td></tr>
<tr>
<td>🇯🇵 koyeb_tyo Tokyo, Japan</td><td>65ms</td></tr>
<tr>
<td>🇸🇬 koyeb_sin Singapore, Singapore</td><td>856ms</td></tr>
</tbody>
</table>
</div><h3 id="heading-cold-kv-read"><em>Cold KV read</em></h3>
<p>This is obtained by fetching a random quote from the KV store using the endpoint <a target="_blank" href="https://quoted.day/api/quote">https://quoted.day/api/quote</a></p>
<p>Note that each call will cache the result for a day at the edge location, resulting in possibly turning cold reads into hot reads as traffic increases.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Region</strong></td><td><strong>Latency</strong></td></tr>
</thead>
<tbody>
<tr>
<td>🇩🇪 fra Frankfurt, Germany</td><td>131ms</td></tr>
<tr>
<td>🇩🇪 koyeb_fra Frankfurt, Germany</td><td>105ms</td></tr>
<tr>
<td>🇬🇧 lhr London, United Kingdom</td><td>110ms</td></tr>
<tr>
<td>🇳🇱 ams Amsterdam, Netherlands</td><td>130ms</td></tr>
<tr>
<td>🇫🇷 cdg Paris, France</td><td>145ms</td></tr>
<tr>
<td>🇸🇪 arn Stockholm, Sweden</td><td>134ms</td></tr>
<tr>
<td>🇫🇷 koyeb_par Paris, France</td><td>127ms</td></tr>
<tr>
<td>🇳🇱 railway_europe-west4-drams3a Amsterdam, Netherlands</td><td>133ms</td></tr>
<tr>
<td>🇺🇸 ewr Secaucus, New Jersey, USA</td><td>197ms</td></tr>
<tr>
<td>🇺🇸 ord Chicago, Illinois, USA</td><td>201ms</td></tr>
<tr>
<td>🇺🇸 iad Ashburn, Virginia, USA</td><td>220ms</td></tr>
<tr>
<td>🇨🇦 yyz Toronto, Canada</td><td>243ms</td></tr>
<tr>
<td>🇺🇸 koyeb_was Washington, USA</td><td>229ms</td></tr>
<tr>
<td>🇺🇸 dfw Dallas, Texas, USA</td><td>287ms</td></tr>
<tr>
<td>🇺🇸 railway_us-east4-eqdc4a Virginia, USA</td><td>270ms</td></tr>
<tr>
<td>🇸🇬 sin Singapore, Singapore</td><td>288ms</td></tr>
<tr>
<td>🇺🇸 sjc San Jose, California, USA</td><td>245ms</td></tr>
<tr>
<td>🇮🇳 bom Mumbai, India</td><td>502ms</td></tr>
<tr>
<td>🇿🇦 jnb Johannesburg, South Africa</td><td>322ms</td></tr>
<tr>
<td>🇸🇬 railway_asia-southeast1-eqsg3a Singapore, Singapore</td><td>323ms</td></tr>
<tr>
<td>🇺🇸 lax Los Angeles, California, USA</td><td>247ms</td></tr>
<tr>
<td>🇺🇸 koyeb_sfo San Francisco, USA</td><td>217ms</td></tr>
<tr>
<td>🇺🇸 railway_us-west2 California, USA</td><td>300ms</td></tr>
<tr>
<td>🇧🇷 gru Sao Paulo, Brazil</td><td>601ms</td></tr>
<tr>
<td>🇯🇵 nrt Tokyo, Japan</td><td>822ms</td></tr>
<tr>
<td>🇸🇬 koyeb_sin Singapore, Singapore</td><td>574ms</td></tr>
<tr>
<td>🇯🇵 koyeb_tyo Tokyo, Japan</td><td>335ms</td></tr>
<tr>
<td>🇦🇺 syd Sydney, Australia</td><td>964ms</td></tr>
</tbody>
</table>
</div><h2 id="heading-kv-writes">KV writes</h2>
<p>This is obtained by fetching <a target="_blank" href="http://quoted.day/api/bump-counter">quoted.day/api/bump-counter</a> which creates a temporary KV pair with an expiration time of 10 minutes. It kind of emulates the concept of initiating a “session”.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><a target="_blank" href="https://quoted.day/api/bump-counter">🇫🇷 cdg Paris, France</a></td><td>128ms</td></tr>
</thead>
<tbody>
<tr>
<td>🇩🇪 koyeb_fra Frankfurt, Germany</td><td>151ms</td></tr>
<tr>
<td>🇩🇪 fra Frankfurt, Germany</td><td>147ms</td></tr>
<tr>
<td>🇫🇷 koyeb_par Paris, France</td><td>194ms</td></tr>
<tr>
<td>🇳🇱 ams Amsterdam, Netherlands</td><td>145ms</td></tr>
<tr>
<td>🇸🇪 arn Stockholm, Sweden</td><td>240ms</td></tr>
<tr>
<td>🇬🇧 lhr London, United Kingdom</td><td>176ms</td></tr>
<tr>
<td>🇺🇸 dfw Dallas, Texas, USA</td><td>212ms</td></tr>
<tr>
<td>🇺🇸 railway_us-west2 California, USA</td><td>238ms</td></tr>
<tr>
<td>🇺🇸 koyeb_was Washington, USA</td><td>305ms</td></tr>
<tr>
<td>🇺🇸 railway_us-east4-eqdc4a Virginia, USA</td><td>295ms</td></tr>
<tr>
<td>🇺🇸 ewr Secaucus, New Jersey, USA</td><td>408ms</td></tr>
<tr>
<td>🇺🇸 iad Ashburn, Virginia, USA</td><td>423ms</td></tr>
<tr>
<td>🇨🇦 yyz Toronto, Canada</td><td>337ms</td></tr>
<tr>
<td>🇺🇸 ord Chicago, Illinois, USA</td><td>359ms</td></tr>
<tr>
<td>🇸🇬 koyeb_sin Singapore, Singapore</td><td>409ms</td></tr>
<tr>
<td>🇺🇸 lax Los Angeles, California, USA</td><td>335ms</td></tr>
<tr>
<td>🇮🇳 bom Mumbai, India</td><td>347ms</td></tr>
<tr>
<td>🇺🇸 sjc San Jose, California, USA</td><td>438ms</td></tr>
<tr>
<td>🇺🇸 koyeb_sfo San Francisco, USA</td><td>247ms</td></tr>
<tr>
<td>🇸🇬 sin Singapore, Singapore</td><td>508ms</td></tr>
<tr>
<td>🇯🇵 nrt Tokyo, Japan</td><td>684ms</td></tr>
<tr>
<td>🇦🇺 syd Sydney, Australia</td><td>713ms</td></tr>
<tr>
<td>🇯🇵 koyeb_tyo Tokyo, Japan</td><td>734ms</td></tr>
<tr>
<td>🇳🇱 railway_europe-west4-drams3a Amsterdam, Netherlands</td><td>1,259ms</td></tr>
<tr>
<td>🇸🇬 railway_asia-southeast1-eqsg3a Singapore, Singapore</td><td>1,139ms</td></tr>
<tr>
<td>🇿🇦 jnb Johannesburg, South Africa</td><td>2,266ms</td></tr>
</tbody>
</table>
</div><h3 id="heading-ssr-page-with-kv-cold-reads"><em>SSR Page with KV cold reads</em></h3>
<p>Lastly, in this test, we combine the reading a random quote (that usually results in a cold KV read) and renders it server-side in a page.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Region</strong></td><td><strong>Latency</strong></td></tr>
</thead>
<tbody>
<tr>
<td>🇫🇷 koyeb_par Paris, France</td><td>111ms</td></tr>
<tr>
<td>🇬🇧 lhr London, United Kingdom</td><td>108ms</td></tr>
<tr>
<td>🇳🇱 railway_europe-west4-drams3a Amsterdam, Netherlands</td><td>125ms</td></tr>
<tr>
<td>🇫🇷 cdg Paris, France</td><td>133ms</td></tr>
<tr>
<td>🇩🇪 koyeb_fra Frankfurt, Germany</td><td>139ms</td></tr>
<tr>
<td>🇩🇪 fra Frankfurt, Germany</td><td>146ms</td></tr>
<tr>
<td>🇸🇪 arn Stockholm, Sweden</td><td>142ms</td></tr>
<tr>
<td>🇳🇱 ams Amsterdam, Netherlands</td><td>70ms</td></tr>
<tr>
<td>🇺🇸 railway_us-east4-eqdc4a Virginia, USA</td><td>151ms</td></tr>
<tr>
<td>🇺🇸 koyeb_was Washington, USA</td><td>159ms</td></tr>
<tr>
<td>🇺🇸 ewr Secaucus, New Jersey, USA</td><td>201ms</td></tr>
<tr>
<td>🇺🇸 iad Ashburn, Virginia, USA</td><td>209ms</td></tr>
<tr>
<td>🇺🇸 ord Chicago, Illinois, USA</td><td>217ms</td></tr>
<tr>
<td>🇺🇸 dfw Dallas, Texas, USA</td><td>220ms</td></tr>
<tr>
<td>🇺🇸 sjc San Jose, California, USA</td><td>191ms</td></tr>
<tr>
<td>🇺🇸 railway_us-west2 California, USA</td><td>201ms</td></tr>
<tr>
<td>🇨🇦 yyz Toronto, Canada</td><td>255ms</td></tr>
<tr>
<td>🇺🇸 lax Los Angeles, California, USA</td><td>257ms</td></tr>
<tr>
<td>🇺🇸 koyeb_sfo San Francisco, USA</td><td>268ms</td></tr>
<tr>
<td>🇮🇳 bom Mumbai, India</td><td>422ms</td></tr>
<tr>
<td>🇯🇵 nrt Tokyo, Japan</td><td>332ms</td></tr>
<tr>
<td>🇸🇬 sin Singapore, Singapore</td><td>284ms</td></tr>
<tr>
<td>🇧🇷 gru Sao Paulo, Brazil</td><td>327ms</td></tr>
<tr>
<td>🇸🇬 railway_asia-southeast1-eqsg3a Singapore, Singapore</td><td>632ms</td></tr>
<tr>
<td>🇸🇬 koyeb_sin Singapore, Singapore</td><td>677ms</td></tr>
<tr>
<td>🇿🇦 jnb Johannesburg, South Africa</td><td>673ms</td></tr>
<tr>
<td>🇦🇺 syd Sydney, Australia</td><td>385ms</td></tr>
<tr>
<td>🇯🇵 koyeb_tyo Tokyo, Japan</td><td>350ms</td></tr>
</tbody>
</table>
</div><h2 id="heading-observations">Observations</h2>
<p>In is interesting to see how you can infer how the KV works just by watching the numbers. It appears the KV store is not actively replicated, but rather KV pairs are copied “on-demand” at remote locations. When cached (by default 1 minute), subsequent reads are fast. The latencies of such “hot” KV pairs are pretty good overall. No complains here. How long the pair remains cached there can also be configured using the <code>cacheTtl</code> parameter during the KV <code>get</code> request. However, the downside of increasing that value is that this cached copy do not reflect changes / updates triggered from other locations during that time.</p>
<p>Unsurprisingly, cold reads have worse latencies. The other thing you can infer from the numbers is that there seem to be an “origin location”, and cold reads latencies increase proportionally according to the distance to this location. Therefore, pay attention “where” you create the KV store, as it impacts all future latencies around the globe. Note that workers KV might change in the future, this is merely an observation of its state right now.</p>
<p>While read operations are OK, the write operations are rather disappointing right now. I expected it to have great latencies too, writing to the “edge” and letting the propagation take place asynchronously, but it is the opposite. Writes appear to communicates with the “origin” storage. The time it takes to set a value gets higher the further away you are from where you created the KV store. This is kind of bad news, because setting/updating values is a pretty common operation, for example to authenticate users. Dear Cloudflare team, I hope you improve that part in the future.</p>
<h2 id="heading-a-word-of-caution">A word of caution</h2>
<p>If you develop your webapp, publish it and take a look at it, you will probably not even notice the bad latencies. You will face the optimal latencies with the origin KV store being near you. However, someone at the other end of the planet will have an uglier experience. If that person has a handful of cache misses or writes, the response time might quickly climb into a few seconds before the response arrives. That is <em>not</em> how I would expect a “distributed” KV store to behave. Let us be clear, right now this behaves more like a centralized KV store with on-demand cached copies at the edge.</p>
<p>Quite ironically, it basically feels more like a traditional single-location database right now (+caches). While latencies of a single cache miss or a single write is not dramatic, it can quickly pile up with multiple calls and especially write-heavy webapps risk facing increased “sluggishness” depending on their location. Here as well, being “minimalistic” regarding KV calls should be taken to heart during the conception of the webapp using workers.</p>
<p>Lastly, there was one more setting available in the Worker: “Default Placement” vs “Smart Placement”. I tried both but I did not see noticeable changes within the latencies. I think it’s due to the fact that there is a single KV store call and that it takes time and traffic to gather telemetry and adjust the placement of workers. It might be great, but for this experiment, it had no effect at all.</p>
<h2 id="heading-single-page-applications-vs-server-side-rendering">Single-Page-Applications vs Server-Side-Rendering</h2>
<p>Here as well, one is not universally better or worse than the other and the answer which one to use is “it depends”.</p>
<p>Besides strong differences regarding frameworks and overall architecture, it also has practical fundamental differences for the end user. It’s also fascinating to see history repeating itself, where the internet first started with server rendered pages, than single-page-application with data fetching took over and a resurgence of SSR, just like in the past, just with new tech stacks.</p>
<p>SSR is actually the easiest one to explain: you fetch all the required data server side, put everything in a template and return the resulting page to the end user. It takes a bit of time and processing power server-side, is not cachable, but the client gets a “finished” page.</p>
<p>The SPA does the opposite. Although the HTML/CSS/JS is static and cached (hence quickly fetched), the resources are typically much larger due to all the client-side javascript libs needed. Then starts the heavy lifting, where data is fetched and the page rendered, typically while showing a loading spinner. As a result, the total time to render the page is longer.</p>
<p>However, interacting with the SPA is typically smoother <em>afterwards</em>, because interactions just exchange data with the server and make local changes to the page. In contrast, SSR means navigating and loading a new page. Hence, the choice whether SPA or SSR is more suited depends on how “interactive” the page/app should be.</p>
<blockquote>
<p>As a rule of thumb, if it’s more like a static “web page”, go for SSR, if it’s more like an interactive “web app”, go for SPA.</p>
</blockquote>
<p>Lastly, the nice thing about Astro, picked here as illustrative example, is that the whole spectrum is possible: static pages, SPA and SSR.</p>
<h2 id="heading-sources">Sources</h2>
<p>The source code of this experiment is here: <a target="_blank" href="https://github.com/dagnelies/quoted-day">https://github.com/dagnelies/quoted-day</a></p>
<p>If you have a Github and a Cloudflare Account, you can also fork &amp; deploy by clicking here:</p>
<p><a target="_blank" href="https://deploy.workers.cloudflare.com/?url=https://github.com/dagnelies/quoted-day"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" /></a></p>
<p>If the button doesn’t work, here it is as link instead: <a target="_blank" href="https://deploy.workers.cloudflare.com/?url=https://github.com/dagnelies/quoted-day">https://deploy.workers.cloudflare.com/?url=https://github.com/dagnelies/quoted-day</a></p>
<p>It will fork the GitHub repository and deploy it on an internal URL so that you can preview it. Afterwards, you can edit the code and it will auto-deploy it, etc.</p>
<p>Note that the example references a KV store that is mine. So you will have to create your own KV store named and swap the QUOTES KV id in the <code>wrangler.json</code> file with yours. You will also have to initially fill it with quotes if you want to reproduce the example. Luckily, there are scripts in the <code>package.json</code> to do just that.</p>
<p>Everything beyond this point would deserve a tutorial on its own. This was merely the result of an experiment, how the latencies hold up and some insights on the platform. Enjoy!</p>
]]></content:encoded></item><item><title><![CDATA[Why I prefer Maven over Gradle]]></title><description><![CDATA[In the Java world, one of the first question developers encounter is "should I use Grade or Maven as build tool?". It's a fundamental decision which will stick to you with time. And when googling it, Gradle's biased comparison even pops up as the top...]]></description><link>https://blog.angelside.net/why-i-prefer-maven-over-gradle</link><guid isPermaLink="true">https://blog.angelside.net/why-i-prefer-maven-over-gradle</guid><category><![CDATA[maven]]></category><category><![CDATA[gradle]]></category><category><![CDATA[Java]]></category><dc:creator><![CDATA[Arnaud Dagnelies]]></dc:creator><pubDate>Sat, 17 Feb 2024 08:57:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1708159918246/a6d70034-2a3a-462a-873c-8c43210e3491.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the Java world, one of the first question developers encounter is <em>"should I use Grade or Maven as build tool?"</em>. It's a fundamental decision which will stick to you with time. And when googling it, Gradle's biased comparison even pops up as the top search result (at least for me).</p>
<p>At first sight, Gradle looks cool:</p>
<ul>
<li><p>Their website looks way nicer and polished than Maven's one</p>
</li>
<li><p>The syntax is much more compact than Maven's verbose XML</p>
</li>
<li><p>Gradle is "newer" while Maven is "older"</p>
</li>
<li><p>Gradle is much faster (according to them)</p>
</li>
</ul>
<p>No wonder people pick it up when faced with uncertainty and just wanna get started. So now, let me tell what's wrong with Gradle IMHO and why Maven is still the better option, even so many years later.</p>
<h2 id="heading-configuration-vs-scripting">Configuration vs Scripting</h2>
<p>Basically, the <code>pom.xml</code> that you define in <strong>Maven is a "configuration"</strong>. You define the name, the version, the list of dependencies, etc. Since it follows a specific schema, with a set of properties to define, you can also look at it visually through a UI for example. It's a declarative definition of your library/app.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707747413246/a1906d1f-4e9e-4b89-9f26-5171e1dbe9a6.png" alt class="image--center mx-auto" /></p>
<p>On the opposite, a <strong>Gradle build script</strong> is exactly that: a script. It's using the <a target="_blank" href="https://www.groovy-lang.org/">Groovy</a> language, or recently also Kotlin, to let you write anything you want. Let it sink in, you use a programming language to define what the build should do. You can import other scripts, send a HTTP request to check the current weather and insert a funny UI generated picture in your build artifact.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707748150235/7ef9b475-e867-4d0f-84af-2d4ac62230a9.png" alt class="image--center mx-auto" /></p>
<p>While they have many aspects in common, they nature ("configuration" vs "script") is what differentiates them fundamentally.</p>
<p>You may think: "Isn't it great if I can do anything with that build script?! It's ultimate freedom!". That is right, but this boon is also a curse.</p>
<blockquote>
<p>When I see a Maven project, with a pom.xml, I know what it does and where to find what. It's always the same. Directories, commands to run, changing the version, whatever, it's the same for all maven projects.</p>
<p>When I see a Gradle project, I have no idea what the build script does. If you don't have a clear documentation ready, you'll have to dive into the build script to actually discover and try to understand what it exactly does.</p>
</blockquote>
<h2 id="heading-the-price-of-freedom">The price of freedom</h2>
<p>It's not rare that you need something specific in your build. In both Maven and Gradle, it's possible to do so, but here also their approach is opposite.</p>
<p>In Gradle, it's straightforward. Since it's scripting, just write whatever you want, you can do anything very easily. Your own build stages, calling functions, using variables, importing some other scripts, whatever. It's easy.</p>
<p>In Maven, it's the opposite. Adding something custom is more difficult. You will have to use a plugin to enable the specific functionality, or even write a plugin yourself if really necessary. While writing a plugin is definitely more work, this kind of also enforces reusability though.</p>
<p>The takeaway here is the same as before. While Maven builds tend to always follow the same build stages and conventions, Gradle builds tend to become more and more complex and customized over time, because it's so easy to "just add a few lines" to the build script. Look at it after a few years and the Maven <code>pom.xml</code> is likely almost as readable as the first days while the Gradle <code>build.gradle</code> script became rocket science.</p>
<blockquote>
<p>As an exercise, I picked a random <code>gradle.build</code> file from another team at work to look at it. It had over thousand lines and the few dependencies it had were externalized in another file and combined in a fancy way.</p>
<p>On the opposite, pick any Maven project, and the list of dependencies will always be in the same place, in the &lt;dependencies&gt;...&lt;/dependencies&gt; tag.</p>
</blockquote>
<p><em>History repeating itself.</em> As a side note, it's interesting to see that in the very early days of Java, before maven was born, build scripts were the norm. In the beginning they were plain shell scripts invoking <code>javac ...</code> to compile the source code, packaging it, etc. Then came "ant" to do the same, in a bit more structured way but still tended to become customized and complex over time. Then one day came the idea to use a more declarative approach, by defining a project and its dependencies while letting the tool take care of how it is build. Maven was born. Then, some day, Gradle was born, because "I want to customize stuff".</p>
<h2 id="heading-gradle-lies-in-your-face">Gradle lies in your face</h2>
<p>Now, this is a little grudge of mine against Gradle's marketing habits. When going on their website, they will feature a "Gradle vs Maven" comparison claiming that Gradle is "oh so much faster" than Maven and the following picture.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707751546753/9c1332b6-cd25-4655-a5e1-c65dfb19af29.png" alt class="image--center mx-auto" /></p>
<p>Now, let's take a closer look...</p>
<p>First, what's shocking is that a "Clean build with tests" is so much faster than the original build! It's almost instant! Including tests! ...let me get this straight: <em>this is not "clean" at all. It's just doing nothing</em>. I find Maven much more sensible in that case, it actually rebuilds everything from scratch. To go a little bit further, a "build" in Maven will just check for changes and compile changed files, which would result in a similar figure, while a "clean build" will remove the whole directory and re-build everything. I find this should be the expected behavior unlike Gradle's "clean build" not cleaning anything. After all, the aim of a clean build is usually to fix issues due to some undesirable thing lying around in the build directory, for whatever reason.</p>
<p>Then, let's look at the normal case: is Gradle really twice as fast? Well, here is another question for you: who compiles the source code? ...got an idea? Well, it's the <code>javac</code> compiler from the JDK, it's not the build tool! So why would Gradle be twice as fast?! Here is the trick: <em>Gradle runs the tests in parallel while Maven do it sequentially</em>. That is the reason! Gradle ain't faster or anything, it just runs the tests in parallel. I dislike this default. It's just a question of time until you get tests having side-effects and race conditions. Then you'll obtain "Heisenberg tests" succeeding sometimes and failing sometimes, depending on how their executions overlap. You'll wonder why and waste lots of time investigating the issue. Moreover, it usually runs in a background jobs after commits anyway.</p>
<p>Now, while I dislike Gradle's defaults, what I'm really annoyed about is how they distort the truth. They should say "we run tests in parallel by default and our 'clean' does nothing instead!". That should have been the correct way to put it instead of using their misleading statements insinuating that they compile faster.</p>
<h2 id="heading-gradle-is-not-simple">Gradle is not simple</h2>
<p>For Maven, the scope of dependencies is relatively straightforward:</p>
<ul>
<li><p>Compile (the "usual" dependency)</p>
</li>
<li><p>Test (for tests)</p>
</li>
<li><p>Provided (provided at runtime by JDK or a container)</p>
</li>
<li><p>Runtime (quite rare. For drivers or alike available at runtime but not for compiling)</p>
</li>
</ul>
<p>It's enough and I never needed anything else.</p>
<p>Gradle on the other hand has <a target="_blank" href="https://docs.gradle.org/current/userguide/java_library_plugin.html#sec:java_library_configurations_graph"><em>lots</em> of scopes</a>:</p>
<ul>
<li><p>api</p>
</li>
<li><p>implementation</p>
</li>
<li><p>compileOnly</p>
</li>
<li><p>compileOnlyApi</p>
</li>
<li><p>runtimeOnly</p>
</li>
<li><p>testImplementation</p>
</li>
<li><p>testCompileOnly</p>
</li>
<li><p>testRuntimeOnly</p>
</li>
<li><p>...a few more deprecated scopes</p>
</li>
<li><p>...a few more classpath scopes</p>
</li>
<li><p>...you can also extend and combine scopes</p>
</li>
</ul>
<p>Well, you basically get it. Gradle is "super-customizable", so much that you often wonder what it exactly does or that you make a mistake without realizing. Gradle sells it as "Maven has few, built-in dependency scopes, which forces awkward module architectures" but IMHO it's Gradle which is confusing and overcomplexified here while Maven has exactly what's actually required.</p>
<p>That is just the tip of the iceberg. But basically Gradle is super-customizable while maven favors conventions. No wonder Gradle is also a company that thrives with support and training. If it was simple, such things would not sell.</p>
<h2 id="heading-gradle-needs-maven-but-not-the-other-way-round">Gradle needs maven, but not the other way round</h2>
<p>Every single library in the Maven Central Repository must have <code>pom.xml</code>. It's the declarative definition of the library containing name, version, license, etc, and most importantly the list of its dependencies. Without a Maven <code>pom.xml</code> there would be no Central Repository nor dependency management possible.</p>
<p>Whether you use Grade or Maven, both read the <code>pom.xml</code> Maven definition to build the dependency tree. It's at the core of the dependency dependencies system to pull all transitive dependencies and resolve version conflicts.</p>
<p>In other words, Maven can live without Gradle, but Gradle still needs Maven to exist. Maven just applies a standardized build based on the <code>pom.xml</code> while Gradle builds in in some way and generates a <code>pom.xml</code> as a build step if you want to actually publish your library.</p>
<h2 id="heading-maven-isnt-perfect-either">Maven isn't perfect either</h2>
<p>Now, I bashed a lot about Gradle, but Maven isn't perfect either. It has issues too. Their website sucks IMHO, it could welcome YAML as more compact alternative format, some plugins should be built-in and the format itself could be tweaked here and there. But overall I find it OK considering it's a format that lived more than 20 years.</p>
<p>The other drawback is a lack of flexibility. It's indeed rigid in how it expects your project to be and may become problematic if you need for example to mix multiple different techs. For example a building a node project, running a python script, etc. as part of the build procedure to place some extra stuff inside the produced artifact. But for that IMHO, it's better to use CI scripts, running as GitHub actions or GitLab pipelines to build a "mixed bundle". Let each tech stack build its own artifacts and combine them later through scripting. I favor that approach over pushing the build scripts customizations too far.</p>
<h2 id="heading-take-it-with-a-grain-of-salt">Take it with a grain of salt</h2>
<p>While I bashed at Gradle and praised Maven, it should be taken with a grain of salt. At the end of the day, they are just tool and either can be used wisely or like a fool.</p>
<p>With maven too, you can also produce "monster <code>pom.xml</code> files" by using tons of plugins and super-complex configurations overriding all defaults. Likewise, Gradle is not necessarily a monster. Use it wisely, keep your build script clean, refrain from adding custom build steps and you will do just fine. It's not bad per-se.</p>
<p>It's just that by default, in the hands of average developers, Maven's <code>pom.xml</code> will tend to remain understandable (because it takes effort to escape out of the conventions) while Gradle's <code>build.gradle</code> will tend to become more complex and customized over time (because it's so easy to do so). All the small shortcuts now and little extra steps that stray away from the build conventions tend to become liabilities in the long term.</p>
<p>As said previously, Gradle's great flexibility and customizability of the build is both a boon and a curse. Although I prefer to build "generic" projects where I can, because it's by far simpler to maintain in the long run, using Gradle definitely has its place when you need more specific stuff that requires customization.</p>
<blockquote>
<p>TL;DR: as a rule of thumb, Maven's <code>pom.xm</code>tends to remain fairly generic with time, while Gradle's <code>build.gardle</code> leans towards being highly customized and therefore complex. This is due to their "nature", while Maven is based on a rigid project "definition", Gradle is a free form build "script".</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[Randomly generated avatars]]></title><description><![CDATA[As a follow-up from an earlier article regarding the update to randomly generated default avatars for Passwordless.ID, I wanted to post a "how to". This is a beginner tutorial since making such avatars is actually really simple.

TL;DR; Here is the f...]]></description><link>https://blog.angelside.net/randomly-generated-avatars</link><guid isPermaLink="true">https://blog.angelside.net/randomly-generated-avatars</guid><category><![CDATA[Beginner Developers]]></category><category><![CDATA[SVG]]></category><category><![CDATA[webdev]]></category><dc:creator><![CDATA[Arnaud Dagnelies]]></dc:creator><pubDate>Wed, 19 Jul 2023 07:55:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1689753302016/25d729c0-27d5-401b-b166-f2c1a9a6add9.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>As a follow-up from an earlier <a target="_blank" href="https://blog.passwordless.id/replacing-avatar-portraits">article</a> regarding the update to randomly generated default avatars for <a target="_blank" href="http://Passwordless.ID">Passwordless.ID</a>, I wanted to post a "how to". This is a beginner tutorial since making such avatars is actually really simple.</p>
</blockquote>
<p>TL;DR; Here is the full demo.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://codepen.io/dagnelies/pen/rNQdZvM">https://codepen.io/dagnelies/pen/rNQdZvM</a></div>
<p> </p>
<h2 id="heading-the-image-format">The image format</h2>
<p>The first thing you should think about is the image format, usually, one of:</p>
<ul>
<li><p><strong>Jpeg</strong>: great for real user photos due to the high compression ratio. However, this compression also produces some "blur" on lines and sharp edges. As such it is not ideal for the avatars we are going to make.</p>
</li>
<li><p><strong>PNG</strong>: theses have lossless compression. In other words, every pixel remains exactly the same as it was originally drawn. Edges and lines remain "sharp".</p>
</li>
<li><p><strong>SVG</strong>: these are scalable vector graphics. Unlike a "raster of pixels", it is a declarative format describing shapes and paths.</p>
</li>
</ul>
<blockquote>
<p>Of course, you could also save it as a "100% quality" Jpeg to avoid any quality loss, but then it is larger than PNGs. Jpeg compression is amazing though for common photos.</p>
</blockquote>
<p>In our case, we picked SVG for the upcoming avatar pictures. In the past, SVG was kind of avoided because support was not always well supported for all software platforms. This is however largely in the past.</p>
<p>SVG offers several benefits: the first is being scalable. Due to its vector nature, it is perfectly sharp at any scale, even if you zoom in on a 4K display. Other "raster" formats like Jpeg or PNG become "pixelated" when zooming in. The other is being more compact. While the byte size of Jpeg/PNG grows with picture size, SVG grows proportional to the shape's complexity. For relatively simple stuff like the avatars here, they are super compact.</p>
<h2 id="heading-the-svg-template">The SVG "template"</h2>
<p>SVG is an XML format that describes the shapes. As such, what will be generated is a big XML string. To be more exact, we will fill the template below with the appropriate values.</p>
<pre><code class="lang-xml">  <span class="hljs-tag">&lt;<span class="hljs-name">svg</span> <span class="hljs-attr">xmlns</span>=<span class="hljs-string">"http://www.w3.org/2000/svg"</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"100"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"100"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">defs</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">linearGradient</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"gradient"</span> <span class="hljs-attr">x1</span>=<span class="hljs-string">"${startX}"</span> <span class="hljs-attr">y1</span>=<span class="hljs-string">"${startY}"</span> <span class="hljs-attr">x2</span>=<span class="hljs-string">"${endX}"</span> <span class="hljs-attr">y2</span>=<span class="hljs-string">"${endY}"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">stop</span> <span class="hljs-attr">offset</span>=<span class="hljs-string">"10%"</span> <span class="hljs-attr">stop-color</span>=<span class="hljs-string">"hsl(${startHue}, 100%, 50%)"</span> /&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">stop</span> <span class="hljs-attr">offset</span>=<span class="hljs-string">"90%"</span> <span class="hljs-attr">stop-color</span>=<span class="hljs-string">"hsl(${endHue}, 100%, 50%)"</span> /&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">linearGradient</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">defs</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">rect</span> <span class="hljs-attr">x</span>=<span class="hljs-string">"0"</span> <span class="hljs-attr">y</span>=<span class="hljs-string">"0"</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"100"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"100"</span> <span class="hljs-attr">fill</span>=<span class="hljs-string">"url(#gradient)"</span> /&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">text</span> <span class="hljs-attr">x</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">y</span>=<span class="hljs-string">"55"</span> <span class="hljs-attr">text-anchor</span>=<span class="hljs-string">"middle"</span> <span class="hljs-attr">dominant-baseline</span>=<span class="hljs-string">"middle"</span> <span class="hljs-attr">font-size</span>=<span class="hljs-string">"75"</span> <span class="hljs-attr">font-family</span>=<span class="hljs-string">"Times New Roman"</span> <span class="hljs-attr">fill</span>=<span class="hljs-string">"#ffffff"</span>&gt;</span>${char}<span class="hljs-tag">&lt;/<span class="hljs-name">text</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">svg</span>&gt;</span>
</code></pre>
<p>Once this template is filled with meaningful values, you will obtain an avatar SVG image that can be stored as a plain normal ".svg" file.</p>
<p>Alternatively, you also deliver it as "data URL" since it is quite compact. This simply means encoding the resource directly instead of a "plain URL" fetching it. It is composed of two parts: the mime-type (<code>image/svg+xml</code> in this case) and the (base64 encoded) data. This can be used like any other URL in the <code>src</code> tag of an image as follows.</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"data:image/svg+xml;base64,{{the-base64-encoded-svg}}"</span> /&gt;</span>
</code></pre>
<p>Voilà, you got your image!</p>
<h2 id="heading-getting-some-random-values">Getting some random values</h2>
<p>The missing step is now filling this SVG template with some random values. Alternatively, if you want something more deterministic, you could also use the hash value of the name for example.</p>
<p>As you saw in the SVG template, instead of using RGB colors, <a target="_blank" href="https://www.w3schools.com/colors/colors_hsl.asp">HSL colors</a> were used. This stands for Hue-Saturation-Lightness. This makes it easy to generate bright colors from all rainbow colors, with maximal color saturation and average "lightness".</p>
<pre><code class="lang-typescript">  <span class="hljs-comment">// Gradient colors</span>
  <span class="hljs-keyword">const</span> startHue = <span class="hljs-built_in">Math</span>.round(<span class="hljs-built_in">Math</span>.random() * <span class="hljs-number">360</span>);
  <span class="hljs-keyword">const</span> endHue   = <span class="hljs-built_in">Math</span>.round(<span class="hljs-built_in">Math</span>.random() * <span class="hljs-number">360</span>);

  <span class="hljs-comment">// Gradient direction</span>
  <span class="hljs-keyword">const</span> angle = <span class="hljs-built_in">Math</span>.random() * <span class="hljs-number">2</span> * <span class="hljs-built_in">Math</span>.PI

  <span class="hljs-comment">// Calculate the start and end points of the gradient</span>
  <span class="hljs-keyword">const</span> startX = (<span class="hljs-built_in">Math</span>.cos(angle) + <span class="hljs-number">1</span>) / <span class="hljs-number">2</span>;
  <span class="hljs-keyword">const</span> startY = (<span class="hljs-built_in">Math</span>.sin(angle) + <span class="hljs-number">1</span>) / <span class="hljs-number">2</span>;
  <span class="hljs-keyword">const</span> endX = <span class="hljs-number">1</span> - startX;
  <span class="hljs-keyword">const</span> endY = <span class="hljs-number">1</span> - startY;

  <span class="hljs-comment">// The character to appear on the avatar</span>
  <span class="hljs-keyword">const</span> char = name.charAt(<span class="hljs-number">0</span>).toUpperCase()
</code></pre>
<p>For the gradient direction, it's a bit more tricky since an angle cannot be provided directly. There are some "transforms" available, but to ensure the widest compatibility with SVG renderers, sticking to the basics seems a safe bet. As such, the angle is converted to starting and ending coordinates for the gradient.</p>
<h2 id="heading-thank-you">Thank you</h2>
<p>The resulting full source code can be seen in the example provided at the beginning. :)</p>
<p><a target="_blank" href="https://codepen.io/dagnelies/pen/rNQdZvM">A Pen by Arnaud Dagnelies (</a><a target="_blank" href="http://codepen.io">codepen.io</a><a target="_blank" href="https://codepen.io/dagnelies/pen/rNQdZvM">)</a></p>
]]></content:encoded></item><item><title><![CDATA[Servers, docker or serverless?]]></title><description><![CDATA[Whether it's good ol' servers, docker containers or lambda functions in the cloud, the code must run on physical machines in the end.

Nevertheless, their natures are very different, with both advantages and drawbacks. So let's cover the basics and q...]]></description><link>https://blog.angelside.net/servers-docker-or-serverless</link><guid isPermaLink="true">https://blog.angelside.net/servers-docker-or-serverless</guid><category><![CDATA[server]]></category><category><![CDATA[serverless]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Cloud]]></category><dc:creator><![CDATA[Arnaud Dagnelies]]></dc:creator><pubDate>Tue, 13 Jun 2023 07:31:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/M5tzZtFCOfs/upload/8be9ede108334300b56b63a66b2ed835.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>Whether it's good ol' servers, docker containers or lambda functions in the cloud, the code must run on physical machines in the end.</p>
</blockquote>
<p>Nevertheless, their natures are very different, with both advantages and drawbacks. So let's cover the basics and quickly summarize the ups and downs of each.</p>
<h2 id="heading-good-ol-servers">Good ol' servers</h2>
<p>Traditional servers should <em>not</em> be underestimated. By servers, we mean either physical hardware or cloud VMs. Even on cheap hardware or VMs, serving up to millions of requests per minute is commonplace. That's already a lot of users. Moreover, if the "stack" is local, with database and application logic on the same machine, there is no communication overhead which boosts performance to the maximum. It is a very cost-effective solution, but on the other hand, they require a bit more effort to administrate and require adequate Linux know-how.</p>
<p>Moreover, achieving redundancy and rolling updates requires attention. This is commonly solved by having at least two servers behind a load balancer to avoid any downtime. Regarding software updates, the usual way is to stop the application, update it then restart it, and perform this sequentially on the servers. The load balancer will take care of avoiding the temporarily unresponsive servers. While this is typically done manually in the beginning, it can also be automated with the right tools later on when the need arises, as well as scaling up to more servers.</p>
<h2 id="heading-docker-images">Docker images</h2>
<p>These images are run in containers that are typically distributed over larger distributed infrastructures. Containers also have the notion of being "volatile": they are typically quickly started and destroyed. For example, if the software crashes or isn't able to run properly, it can simply be destroyed and a new one created on the fly with the same image, effectively restoring it to its original state in a short time.</p>
<p>Using docker in production involves two levels: the software "images" and the "orchestrator" responsible to run these images on various servers. The latter, like Kubernetes, is typically offered as a service. With docker, scalability is achieved by simply launching "one more container" on the underlying infrastructure.</p>
<p>In the typical docker "stacks" the databases, APIs and UIs are run in separate containers. While this is not an ironclad rule, it is regarded as best practice to do so. This makes it possible to update and scale each part independently according to usage, on the other hand, this adds some overhead for network communication between containers. Some effort is also required to configure how all containers are distributed and "linked together" to interact properly.</p>
<p>The usual way updates are done is by simply launching new containers running the new image and then removing the obsolete ones afterwards. This is straightforward if images are independent, but if an update requires some coordinated update of multiple images at once, it might become more tricky.</p>
<h2 id="heading-serverless">Serverless</h2>
<p>The idea here is that you only write the code and that it is invoked "on demand". You don't take care of the infrastructure at all; the provider will take care of it. This comfort has a strong constraint: it must be completely stateless. There is no persistence guarantee between requests since it can run "anywhere". This requires planning accordingly in advance, usually coupling it with databases as a service and "object storage" services to persist files.</p>
<p>This "serverless" paradigm comes in two flavours, related to their underlying technology.</p>
<h3 id="heading-like-aws-lambda">Like AWS Lambda</h3>
<p>This launches a minimal docker container, processes the request, reuses it for further requests and destroys it after some idle time. This offers large freedom regarding the programming language and frameworks to process the request and almost no constraints regarding execution. However, they do have their downside. In particular, the very first request will take a long time, also called a "cold start" since it involves starting an environment. For the right price, it is possible to leave them "hot".</p>
<p>If you leave them hot, they are expensive, if not, your users often suffer from high latencies. As such, they are best suited for on-demand background jobs or long-running tasks IMHO.</p>
<h3 id="heading-like-cloudflare-workers">Like Cloudflare Workers</h3>
<p>This runs JS/TS functions directly in a "chrome-based engine" to process the function. This is cheap, fast and can be run in a distributed manner easily. This is perfect for small functions running "at the edge" of the network. Every time a request is processed, the function's code is loaded and interpreted on the fly to produce the response. Async code invoking other services is also possible.</p>
<p>These functions should be as self-contained as possible, and providers usually put clear constraints on the overall size and max processing time. While it is great for small functions, it becomes counter-productive for larger programs offering complex functionality.</p>
<h2 id="heading-what-should-i-pick">What should I pick?</h2>
<p>Scaling can be achieved with either and for most things there is no clear-cut choice which one is clearly better. It's rather a balancing of advantages and disadvantages, mixed in with the team's expertise, experience and preferences.</p>
<p>What follows are not iron-clad rules but rather rule-of-thumb pieces of advice to be in the comfort zone.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>Servers/VMs</td><td>Docker</td><td>Serverless (Lambda like)</td><td>Serverless (Workers like)</td></tr>
</thead>
<tbody>
<tr>
<td>Low amount of code</td><td></td><td></td><td><strong>🙂</strong></td><td><strong>🙂</strong></td></tr>
<tr>
<td>High amount of code</td><td><strong>🙂</strong></td><td><strong>🙂</strong></td><td></td><td></td></tr>
<tr>
<td>No linux know-how</td><td></td><td></td><td><strong>🙂</strong></td><td><strong>🙂</strong></td></tr>
<tr>
<td>Cheap</td><td><strong>🙂</strong></td><td></td><td></td><td><strong>🙂</strong></td></tr>
<tr>
<td>Lots of "services"</td><td></td><td><strong>🙂</strong></td><td></td><td></td></tr>
<tr>
<td>Keep large stuff in memory</td><td><strong>🙂</strong></td><td></td><td></td><td></td></tr>
<tr>
<td>Computation intensive</td><td><strong>🙂</strong></td><td><strong>🙂</strong></td><td><strong>🙂</strong></td><td></td></tr>
<tr>
<td>Background jobs "on demand"</td><td></td><td><strong>🙂</strong></td><td><strong>🙂</strong></td><td></td></tr>
<tr>
<td>Super low latencies</td><td></td><td></td><td></td><td><strong>🙂</strong></td></tr>
</tbody>
</table>
</div><p>Like I said, the smileys are not a hard requirement. For example, you can also have simple code on servers and complex code on serverless too, it's just likelier that you will likely leave the comfort zone and lose some of its benefits.</p>
<p><em>If the functionality is simple</em>, go serverless with Cloudflare Workers like offerings! This is free to start with using most providers, scales well, offers top latencies worldwide and has excellent pricing even on large scale. The ecosystem might not be very mature, but with simple functionality comes low complexity where an ecosystem is not that critical.</p>
<p><em>If you require large amounts of memory or GPUs</em>, physical servers or VMs might be advantageous because of pricing alone and its efficiency since there are no abstractions in between.</p>
<p><em>If you have a complex software landscape</em> with many web services, use docker. Since the web services can easily be packaged as docker images and almost every open source software already has docker images too, it is simplest to use them directly.</p>
<p><em>If it's something like background jobs</em> running once in a while, use AWS Lambda-like functions. They can process complex stuff, run longer and you are not worried about cold starts and latency issues.</p>
<h2 id="heading-some-use-cases">Some use cases</h2>
<h3 id="heading-passwordlessidhttppasswordlessid"><a target="_blank" href="http://Passwordless.ID">Passwordless.ID</a></h3>
<p>Here, the API could be divided into individual, largely decoupled functions of reasonable complexity. That made Cloudflare Workers a perfect candidate. Thanks to that, the code can run over worldwide distributed datacentres with optimal latency. Perfect for manageability, scalability and pricing.</p>
<h3 id="heading-keyvaluerockshttpkeyvaluerocks"><a target="_blank" href="http://KeyValue.Rocks">KeyValue.Rocks</a></h3>
<p>This is a data-hungry beast. It stores large amounts of data in memory for fast access and high throughput. As such, VMs were used. They are cheaper and less complex than orchestrating a docker deployment. Since the project was kind of low activity with rare updates meanwhile, it was adequate.</p>
]]></content:encoded></item><item><title><![CDATA[Extracting addresses from OpenStreetMaps]]></title><description><![CDATA[Why?

Because there is no worldwide quality source for addresses!

Really, that's no joke. There are many commercial providers for "industrialized countries" of variable quality/pricing but worldwide coverage is lacking, the data formats are diverse ...]]></description><link>https://blog.angelside.net/extracting-addresses-from-openstreetmaps</link><guid isPermaLink="true">https://blog.angelside.net/extracting-addresses-from-openstreetmaps</guid><category><![CDATA[openstreetmap]]></category><category><![CDATA[GeoJSON]]></category><category><![CDATA[addresses]]></category><dc:creator><![CDATA[Arnaud Dagnelies]]></dc:creator><pubDate>Mon, 15 May 2023 07:37:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Z8UgB80_46w/upload/5de4c78bcfa84144c5d042fc5c69fc50.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-why">Why?</h1>
<blockquote>
<p>Because there is no worldwide quality source for addresses!</p>
</blockquote>
<p>Really, that's no joke. There are many commercial providers for "industrialized countries" of variable quality/pricing but worldwide coverage is lacking, the data formats are diverse and the license terms provider specific.</p>
<p>There are also some open source projects related to addresses though, each with their gochas. Two of these projects are mentioned in the last section "Honorable Mentions" at the end of this article, along with their drawbacks.</p>
<p>This project is born in order to provide quality addresses with worldwide coverage under an open license, by directly extracting addresses from the raw data dumps of <a target="_blank" href="https://openstreetmap.org">OpenStreetMap</a>.</p>
<h2 id="heading-birth-of-openstreetdataorghttpsopenstreetdataorg">Birth of <a target="_blank" href="https://openstreetdata.org">OpenStreetData.org</a></h2>
<p>How does the it look like? Here is a screenshot, but if you prefer, check out <a target="_blank" href="https://openstreetdata.org">the website</a> directly.</p>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7tb78xn8teds3r89i3oh.png" alt="Image description" /></p>
<p>It is divided into two parts: extracts and addresses. Another "points of interest" was planned, but not further developed due to lack of time.</p>
<h2 id="heading-country-dumps">Country dumps</h2>
<p>Country extracts are provided in two formats:</p>
<ul>
<li><p><a target="_blank" href="https://wiki.openstreetmap.org/wiki/PBF_Format">PBF</a>, the native OpenStreetMap binary format. This format is very compact and many tools can handle it efficiently. Nevertheless, it is not always very practical to handle due to its low-level nature. It's basically a huge list of points with IDs, lines that reference these IDs and relations that reference the lines.</p>
</li>
<li><p><a target="_blank" href="https://en.wikipedia.org/wiki/GeoJSON">GeoJson</a> sequences. It's a text file where each line is a "feature": a JSON object with arbitrary properties and a geometry with coordinates. Although the file size is typically larger and the processing sometimes slower, it offers other benefits. The JSON format is universal, the line-based sequence makes it straightforward to filter it with grep-like tools and the geometry can be parsed directly without requiring to go through the whole file.</p>
</li>
</ul>
<p>Note though that both formats are not 100% equivalent. During the conversion process, some choices were required to be made. In particular, in the original PBF a "closed line" (where the last point is the same as the first) could be interpreted as a line or as an area either way. There is no clear-cut indication whether it's a "line" happening to turn in a circle, like a roundabout, or a polygonal "area", like a building outline. This led to "closed lines" being interpreted as lines or polygons based on a lot of hand-picked feature properties. For example, if <code>building=...</code> was part of the properties, it was considered a polygon, except if an <code>area=false</code> tag was also present, and so on.</p>
<h2 id="heading-administrative-areas">Administrative areas</h2>
<p>Despite not being shown on the site, extracting precise boundaries of a country's provinces, regions, counties, cities, suburbs and so on was the first crucial step. How a country is subdivided into smaller areas varies greatly from country to country and is abstracted under the name "administrative areas" of various levels.</p>
<p>This step is crucial because of the way addresses are extracted. Streets and houses were extracted using "spatial joins" with the administrative areas. Their coordinates were used to determine which administrative areas (city, county, province...) they belong to, as well as the postal code ...if postal code areas are defined in the country.</p>
<blockquote>
<p>Currently, the reason of missing (or wrong) addresses for some countries are improper <a target="_blank" href="https://github.com/dagnelies/open-street-data/blob/main/admin_level_mapping.tsv">mapping of the administrative areas</a>.</p>
</blockquote>
<h3 id="heading-streets">Streets</h3>
<p>Here is an example of the "streets" for Austria:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>suburb</strong></td><td><strong>country</strong></td><td><strong>state</strong></td><td><strong>province</strong></td><td><strong>city</strong></td><td><strong>postal_code</strong></td><td><strong>street_name</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Abtsdorf</td><td>AT</td><td>Oberösterreich</td><td>Bezirk Vöcklabruck</td><td>Attersee am Attersee</td><td>4864</td><td>Abtsdorf</td></tr>
<tr>
<td>Abtsdorf</td><td>AT</td><td>Oberösterreich</td><td>Bezirk Vöcklabruck</td><td>Attersee am Attersee</td><td>4864</td><td>Altenberg</td></tr>
<tr>
<td>Abtsdorf</td><td>AT</td><td>Oberösterreich</td><td>Bezirk Vöcklabruck</td><td>Attersee am Attersee</td><td>4864</td><td>Attergauer Landesstraße</td></tr>
<tr>
<td>Abtsdorf</td><td>AT</td><td>Oberösterreich</td><td>Bezirk Vöcklabruck</td><td>Attersee am Attersee</td><td>4864</td><td>Attersee</td></tr>
<tr>
<td>Abtsdorf</td><td>AT</td><td>Oberösterreich</td><td>Bezirk Vöcklabruck</td><td>Attersee am Attersee</td><td>4864</td><td>Atterseestraße</td></tr>
<tr>
<td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr>
<tr>
<td></td><td>AT</td><td>Vorarlberg</td><td>Bezirk Feldkirch</td><td>Marktgemeinde Rankweil</td><td>6830</td><td>Wüstenrotgasse</td></tr>
<tr>
<td></td><td>AT</td><td>Vorarlberg</td><td>Bezirk Feldkirch</td><td>Marktgemeinde Rankweil</td><td>6830</td><td>Zehentstraße</td></tr>
<tr>
<td></td><td>AT</td><td>Vorarlberg</td><td>Bezirk Feldkirch</td><td>Marktgemeinde Rankweil</td><td>6830</td><td>Zieglerweg</td></tr>
<tr>
<td></td><td>AT</td><td>Vorarlberg</td><td>Bezirk Feldkirch</td><td>Marktgemeinde Rankweil</td><td>6830</td><td>Zunftgasse</td></tr>
<tr>
<td></td><td>AT</td><td>Vorarlberg</td><td>Bezirk Feldkirch</td><td>Marktgemeinde Rankweil</td><td>6830</td><td>Übersaxner Straße</td></tr>
</tbody>
</table>
</div><p><em>168769 rows × 7 columns</em></p>
<p>It extracted all streets having a name from the raw data and determined the administrative areas and postal code it belongs according to their centroid. As such, it is a slightly simplified streets list. If a street might cross multiple cities or postal codes for example, it will solely be listed in the "main one" (according to its center). For more precise addresses, see below.</p>
<p>Note that "suburb" may be empty depending on the size of the city. This is normal since not all cities are further divided into suburbs.</p>
<h3 id="heading-houses">Houses</h3>
<p>Houses is a dataset listing each house (anything with a house number) individually, including its coordinates and the administrative areas it lies within.</p>
<h3 id="heading-addresses">Addresses</h3>
<p>In this case, the houses are "merged" into streets with house numbers. Unlike the "streets" approach, it results in a more fine-grained dataset.</p>
<ul>
<li><p>it includes only streets with at least a single house (number)</p>
</li>
<li><p>it differentiates between street sections with house number ranges belonging to different administrative areas or postal codes</p>
</li>
<li><p>it differentiates between different sides of the street (with odd/even house numbers) belonging to different administrative areas or postal codes</p>
</li>
<li><p>it has boundaries</p>
</li>
</ul>
<p>Here is an example of such an address file for Austria.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td><strong>postal_code</strong></td><td><strong>city</strong></td><td><strong>street</strong></td><td><strong>x_min</strong></td><td><strong>x_max</strong></td><td><strong>y_min</strong></td><td><strong>y_max</strong></td><td><strong>house_min</strong></td><td><strong>house_max</strong></td><td><strong>house_odd</strong></td><td><strong>house_even</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>0</strong></td><td>1010</td><td>Vienna</td><td>Weihburggasse</td><td>16.375769</td><td>16.375769</td><td>48.205242</td><td>48.205242</td><td>26</td><td>26</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>1</strong></td><td>1010</td><td>Wien</td><td>Abraham-a-Sancta-Clara-Gasse</td><td>16.362970</td><td>16.363213</td><td>48.209789</td><td>48.209910</td><td>1</td><td>2</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>2</strong></td><td>1010</td><td>Wien</td><td>Akademiestraße</td><td>16.370855</td><td>16.372425</td><td>48.200877</td><td>48.203575</td><td>1</td><td>13</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>3</strong></td><td>1010</td><td>Wien</td><td>Albertinaplatz</td><td>16.368138</td><td>16.369344</td><td>48.204084</td><td>48.204750</td><td>1</td><td>3</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>4</strong></td><td>1010</td><td>Wien</td><td>Alte Walfischgasse</td><td>16.371740</td><td>16.371740</td><td>48.203559</td><td>48.203559</td><td>9</td><td>9</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>...</strong></td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr>
<tr>
<td><strong>147137</strong></td><td>9991</td><td>Gemeinde Dölsach</td><td>Waidachweg</td><td>12.825955</td><td>12.827117</td><td>46.830659</td><td>46.831055</td><td>4</td><td>9</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>147138</strong></td><td>9991</td><td>Gemeinde Dölsach</td><td>Wenzl PLatz</td><td>12.841072</td><td>12.841634</td><td>46.826521</td><td>46.826902</td><td>1</td><td>3</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>147139</strong></td><td>9992</td><td>Gemeinde Iselsberg-Stronach</td><td>Großglockner Straße</td><td>12.841043</td><td>12.858008</td><td>46.833271</td><td>46.854501</td><td>1</td><td>206</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>147140</strong></td><td>9992</td><td>Gemeinde Iselsberg-Stronach</td><td>Iselsberg</td><td>12.835091</td><td>12.855994</td><td>46.833822</td><td>46.846260</td><td>5</td><td>212</td><td>True</td><td>True</td></tr>
<tr>
<td><strong>147141</strong></td><td>9992</td><td>Gemeinde Iselsberg-Stronach</td><td>Stronach</td><td>12.849133</td><td>12.858230</td><td>46.826562</td><td>46.833270</td><td>2</td><td>63</td><td>True</td><td>True</td></tr>
</tbody>
</table>
</div><p><em>146322 rows × 11 columns</em></p>
<p>It may not be perfect, for example, the first line with a misinterpreted city name is quite mysterious.</p>
<h2 id="heading-challenges">Challenges</h2>
<h3 id="heading-big-data">"Big Data"</h3>
<p>Dealing with large data is challenging. It's not thousands of points, it's not millions, it's many billions of points, lines, polygons and relations.</p>
<p>Seems like a detail? Well, for example, you cannot even load the planet's data at once in memory. It's simply too big.</p>
<p>You cannot just "do as you please" with inefficient code. Every line of code, every operation, must be crafted with care, well thought out, and fine-tuned to keep processing time and memory to a minimum.</p>
<p>As an example, just for processing the data of a single country, even 32Gb RAM is not enough for larger countries and it takes many hours with the current code, despite best efforts.</p>
<h3 id="heading-producing-precise-country-extracts">Producing precise country extracts</h3>
<p>There are sites like <a target="_blank" href="http://geofabrik.de">geofabrik.de</a> providing country extracts to download. However, they turned out to be not precise enough for me. They use "simplified country border polygons" that are "cutting corners" and therefore missing addresses in areas near the borders. So I had to "split the planet" myself.</p>
<p>To do so, the first step was to extract <em>exact</em> country boundaries. Interestingly, these might change over time. Usually, it's minor modifications like slightly adjusting the border or correcting mistakes. But sometimes the border might move a bit more in "unstable" parts of the world. The point here is that these borders are not "definitive" but evolve slightly over time.</p>
<p>The next step is splitting the world into country extracts. Here again, it cannot be naively done in a single step. Doing so, even 256Gb RAM would not suffice to split at once. So the splitting must be done in multiple steps: first in continents, then in regions, then in countries so that it fits in a "reasonable" amount of memory.</p>
<p>And cutting whole continents with a super precise boundary constituted from millions of points is not efficient either. On the other hand, computing the total bounds of the continent is pointless too. For example, the outer bounds of just France would cover almost the whole world since it possesses many islands around the world as part of its territory. You get the point, some extra work must be done to simplify the geometry without losing stuff but without including too much either.</p>
<p>Then, there are ways or area relations that cross boundaries. Some things from the raw data are not always clear whether it's a "closed line" or an "area", and so on. It's full of technical details which make even producing what look like simple "country extracts" challenging.</p>
<h3 id="heading-heterogenous-data">Heterogenous data</h3>
<p>The OpenStreetMap <a target="_blank" href="https://wiki.openstreetmap.org/wiki/Planet.osm">raw data</a> is not a homogenous clearly defined dataset. It is a huge amount of points, lines and relations, each with completely arbitrary properties. For example, a statue might be a point with metadata indicating when it was built, and from whom, along with some tourist guide number. Depending on where you look on the map, you may also notice different habits of mappers using a diversified arsenal of "<a target="_blank" href="https://wiki.openstreetmap.org/wiki/Tags">tags</a>" to describe things and the community as a whole has different opinions on how to do things, for example with <a target="_blank" href="https://wiki.openstreetmap.org/wiki/Addresses">addresses</a>, which often have local flavours.</p>
<p>If you dig into the raw data of OpenStreetMap, you will find interesting things. For example, you will find tags like <code>addr:street=...</code> and <code>addr:city=...</code> which sounds promising. These are also very simple (and quick) to extract since it's attached directly to the data. Great right? Well, it would be this data was complete but it is far from. Depending on the country you are looking at, it might be mostly widespread or barely used. Even if it's there, the coverage and the content are usually quite fuzzy. For example, the street might be named "Wall Street" on one building while another building uses "Wall St.". Likewise, the city in one building may be "N.Y. City" while another uses "New York". Postcodes may also be written in individual houses, but not match the postal boundary accurately, etc. This makes processing these tags directly error-prone. It's better than nothing but there are ways to make it better.</p>
<p>Namely what we did. Spatial joins of houses/streets points into administrative/postal areas in order to extract the most information possible. If those areas are not mapped, a fallback to the tags is used, but only as "fallback" since they are usually not that precise.</p>
<h3 id="heading-manual-labor">Manual labor</h3>
<p>Doing this is quite some work. It's not just running a process and be done with it. It's craftsmanship where you change a few lines of code and manually inspect the results. Just checking if more streets/houses/addresses are produced is not sufficient either. It might be that the output is of worse quality because street names are duplicated or listed in the wrong "areas" or some other data mistakes. It might also be that the "couple of lines change" works perfectly for one country but breaks in another because of local differences, like for example the presence or absence of postal codes.</p>
<p>Sometimes, you also see odd things in the data. When this happens you usually spend some time to investigate "why" it is so. Is it the raw map data that is strange? Is it some situation you did not think of? Is there a bug in the code? Is some third-party library not working as expected?... It's really full of weird things, from buildings on the map having mistakenly used "national boundaries" tags to sudden performance drops in third-party libraries when calling a certain function.</p>
<h2 id="heading-addresses-are-crazy">Addresses are crazy</h2>
<p>Below, I will illustrate how addresses are crazy. It's not something that is homogenous worldwide. It's full of regional quirks.</p>
<h3 id="heading-is-it-a-country-or-not">Is it a country or not?</h3>
<p>You may think that something as basic as countries and boundaries is clear-cut. But it is not. Take Kosovo for example.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/18/Europe-Republic_of_Kosovo.svg/250px-Europe-Republic_of_Kosovo.svg.png" alt="Location in Europe" class="image--center mx-auto" /></p>
<p>For half of the world (marked in red), Kosovo constitutes a province of Serbia, while for the other half of the world (marked in blue) Kosovo is recognized as an independent country.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Kosovo_relations.svg/400px-Kosovo_relations.svg.png" alt class="image--center mx-auto" /></p>
<p>...and that's not unique to Kosovo. There are plenty of regions in the world where territory is disputed, where border shift with local wars and where sovereignty depends on who you ask.</p>
<p>What stance do I take here? I simply use the list of countries as defined by the united nations, defined by their country codes <a target="_blank" href="https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2">ISO 3166-1</a>. It might not be ideal, but it is pragmatic.</p>
<h3 id="heading-a-city-with-many-borders">A city with many borders</h3>
<p>On a small scale level, borders can be crazy too. For example, check out the little town of <a target="_blank" href="https://en.wikipedia.org/wiki/Baarle-Nassau">Baarle-Nassau</a>, located in the south of the Netherlands, near the Belgium border.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/Baarle-Nassau_-_Baarle-Hertog-en.svg/365px-Baarle-Nassau_-_Baarle-Hertog-en.svg.png" alt class="image--center mx-auto" /></p>
<p>This town contains 22 small <a target="_blank" href="https://en.wikipedia.org/wiki/Exclave">exclaves</a> of the Belgian town <a target="_blank" href="https://en.wikipedia.org/wiki/Baarle-Hertog">Baarle-Hertog</a>, some of which contain counter-exclaves of Nassau. The borders cross streets in the middle, sometimes multiple times and a single house might have a Belgium address on one side and a Netherlands address on the other. As you see, extracting addresses can quickly become challenging. ;)</p>
<h3 id="heading-a-city-center-without-street-names">A city center without street names</h3>
<p>Not all addresses are based on "streets". Take a look a <a target="_blank" href="https://en.wikipedia.org/wiki/Mannheim">Mannheim</a> for example. There, the city center is divided like a <a target="_blank" href="https://en.wikipedia.org/wiki/Mannheim#Block_numbering_and_computer_mapping">big grid</a>.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2d/Mannheim_Quadratstadt_beschriftet.png/708px-Mannheim_Quadratstadt_beschriftet.png" alt="File:Mannheim Quadratstadt beschriftet.png" class="image--center mx-auto" /></p>
<p>There, each "block" has an identifier, like "C3" while the streets are unnamed. Likewise, house number does not belong to a "street" but to a block. In other words, your address might be "C3, 17" if you live in building 17 of block "C3".</p>
<h3 id="heading-fancy-house-numbers">Fancy house numbers</h3>
<p>Do you want to use regexes to filter valid house numbers? Well, that might not really work out. For example, the following image is a valid Vietnamese house number, near the coasts of Ho Chi Minh city.</p>
<p><img src="https://qph.cf2.quoracdn.net/main-qimg-28c36359aa26de2b8c2d68a1b52de29d-pjlq" alt class="image--center mx-auto" /></p>
<hr />
<blockquote>
<p>The world is full of surprises. Also regarding addresses, it's full of diversity and local quirks and I believe there is nothing that does not exist.</p>
</blockquote>
<h2 id="heading-honorable-mentions">Honorable mentions</h2>
<p>There are also two noticeable open source projects trying to bring addresses to the public domain.</p>
<p><a target="_blank" href="http://openaddresses.io"><strong>openaddresses.io</strong></a></p>
<p>This is probably the most famous one. It works by running various "scraping scripts" against various "raw data sources". The result of this approach has few drawbacks though, directly related to its approach.</p>
<ul>
<li><p>the licensing is problematic. Basically, it says "use this data according to the license from the data source" ...which is not obvious, since the original issue is not directly linked, often in native language and the licensing terms makes the usage of this data questionable.</p>
</li>
<li><p>the coverage is lacking</p>
</li>
<li><p>the scraping sometimes breaks or is outdated because of changes in the raw source</p>
</li>
<li><p>despite being "open", some things are obfuscated and make reproducing or direct downloads difficult</p>
</li>
</ul>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9s76uc80d9e9rdylfj00.png" alt="Image description" /></p>
<p><a target="_blank" href="http://osmnames.org"><strong>osmnames.org</strong></a></p>
<p>Despite less known, this is IMHO a better source of addresses. It is based on addresses extraction from OpenStreetMap and therefore has worldwide coverage and a homogenous license: the "ODbL - Open Database License".</p>
<p>The only drawback it has IMHO is:</p>
<ul>
<li><p>the lack of postal codes</p>
</li>
<li><p>the <code>admin_level</code> mapping not ideal</p>
</li>
</ul>
<p>The lack of postal codes may seem like a detail, but it is crucial for addresses. Without it, addresses are simply incomplete. Since this project is open source, I also tried to improve it by adding postal code (see issue) but it turned out too be too difficult/challenging for me. Mostly because I am unfamiliar with PostGIS. The code however, is of quality. This lead me to the current project.</p>
<p>The second issue is more subtle, and leads to missing addresses in some countries because administrative boundaries are not properly mapped. Also, the code is not suited for experimentation and from my understanding, there were ways to "get more" out of the raw OpenStreetMap data dumps than how they did it.</p>
]]></content:encoded></item></channel></rss>