47 lines
5.6 KiB
XML
47 lines
5.6 KiB
XML
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
|
|
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
|
|
<channel>
|
|
<title>Divyam Ahuja</title>
|
|
<link>http://localhost:1313/</link>
|
|
<description>Recent content on Divyam Ahuja</description>
|
|
<generator>Hugo</generator>
|
|
<language>en-us</language>
|
|
<lastBuildDate>Fri, 08 May 2026 00:00:00 +0000</lastBuildDate>
|
|
<atom:link href="http://localhost:1313/index.xml" rel="self" type="application/rss+xml" />
|
|
<item>
|
|
<title>Hello World</title>
|
|
<link>http://localhost:1313/posts/hello-world/</link>
|
|
<pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate>
|
|
<guid>http://localhost:1313/posts/hello-world/</guid>
|
|
<description><p>Welcome to my new blog! I&rsquo;ve rebuilt my personal website from scratch using <a href="https://gohugo.io/">Hugo</a>, a fast static site generator written in Go.</p>
<h2 id="why-hugo">Why Hugo?</h2>
<p>I wanted something minimal, fast, and that gets out of the way. Hugo checks all the boxes:</p>
<ul>
<li><strong>Blazing fast builds</strong> — the entire site builds in under 100ms</li>
<li><strong>Markdown-first</strong> — I write posts in plain markdown files</li>
<li><strong>Zero JavaScript</strong> — the site ships pure HTML and CSS</li>
<li><strong>Built-in features</strong> — syntax highlighting, RSS feeds, sitemaps, all out of the box</li>
</ul>
<h2 id="the-design">The Design</h2>
<p>The design is inspired by the <a href="https://probberechts.github.io/hexo-theme-cactus/">Cactus Dark</a> theme — a minimalist, terminal-inspired aesthetic with monospace typography and a dark color scheme. I built the theme from scratch for full control.</p></description>
|
|
</item>
|
|
<item>
|
|
<title>Building a PDF Translation Pipeline</title>
|
|
<link>http://localhost:1313/posts/pdf-translation-pipeline/</link>
|
|
<pubDate>Wed, 11 Feb 2026 00:00:00 +0000</pubDate>
|
|
<guid>http://localhost:1313/posts/pdf-translation-pipeline/</guid>
|
|
<description><p>Recently, I needed to translate a PDF document from Hindi to English. Sounds simple enough, right? Turns out, it&rsquo;s a surprisingly deep rabbit hole.</p>
<h2 id="the-pipeline">The Pipeline</h2>
<p>The approach I settled on follows this flow:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">PDF → Images → OCR → Translation → Rendered Images → PDF
</span></span></code></pre></div><p>Each step has its own set of challenges:</p>
<ol>
<li><strong>PDF to Images</strong>: Convert each page to a high-DPI image for better OCR accuracy</li>
<li><strong>OCR</strong>: Extract text with position data using PaddleOCR</li>
<li><strong>Translation</strong>: Run extracted text through NLLB (No Language Left Behind)</li>
<li><strong>Rendering</strong>: Paint translated text back onto the original image</li>
<li><strong>Assembly</strong>: Combine rendered images back into a PDF</li>
</ol>
<h2 id="lessons-learned">Lessons Learned</h2>
<ul>
<li><strong>DPI matters a lot</strong> — bumping from 150 to 300 DPI dramatically improved OCR accuracy for Hindi text</li>
<li><strong>Font rendering is hard</strong> — getting translated text to fit in the same bounding boxes required careful font size calculation</li>
<li><strong>Fallback strategies</strong> — TrOCR as a fallback when PaddleOCR fails on certain text regions</li>
</ul>
<p>The code is messy but it works. Sometimes that&rsquo;s enough.</p></description>
|
|
</item>
|
|
<item>
|
|
<title>About</title>
|
|
<link>http://localhost:1313/about/</link>
|
|
<pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
|
|
<guid>http://localhost:1313/about/</guid>
|
|
<description><p>Hi, I&rsquo;m a full-stack developer who loves building things. I enjoy hacking on side projects, exploring new technologies, and deep-diving into the technical details of how things work.</p>
<p>I work across the stack — from crafting interactive frontends to designing robust backend systems. When I&rsquo;m not writing code, I&rsquo;m probably reading about distributed systems, tinkering with new tools, or working on one of my many side projects.</p>
<p>I enjoy hacking on things and deep diving into the technical details of protocols, performance optimization, and algorithms.</p></description>
|
|
</item>
|
|
<item>
|
|
<title>Projects</title>
|
|
<link>http://localhost:1313/projects/</link>
|
|
<pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
|
|
<guid>http://localhost:1313/projects/</guid>
|
|
<description></description>
|
|
</item>
|
|
<item>
|
|
<title>Resume</title>
|
|
<link>http://localhost:1313/resume/</link>
|
|
<pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
|
|
<guid>http://localhost:1313/resume/</guid>
|
|
<description></description>
|
|
</item>
|
|
</channel>
|
|
</rss>
|