Show HN: Cowork/Codex DOCX plugin. Uses 2x fewer tokens than the docx skill

Posted by tanin 2 hours ago

Counter4Comment3OpenOriginal

Hi HNers,

I'd like to share our DOCX plugin for Cowork and Codex.

It uses 2-5x fewer tokens compared to the traditional docx skill because it doesn't write any code nor execute python/node script. It is also much more reliable.

Our DOCX plugin converts docx<->html bidirectionally. This means AI only operates on HTML. AI is excellent and very efficient when it comes to HTML.

Most libraries (if not all) support docx->html, but none supports html->docx. This is what is novel about our approach.

Here's the demo: https://drive.google.com/file/d/1UNlUJYwkNX3NiANDkLLb3UoRSms...

We've been using it in-house for redlining legal documents, and we love it. If you redline docx files, please give it a try: https://github.com/LegalRabbit-AI/legalrabbit-docx-claude-pl...

Comments

Comment by dev-kdrainc 11 minutes ago

Thanks for sharing! I like your approach to working under the hood! Great job

Comment by xms17189 1 hour ago

Interesting approach. Does keeping the model in HTML also preserve enough structure for tracked changes/comments, or do you handle those as a separate layer when converting back to DOCX?

Comment by tanin 1 hour ago

Thank you!

My thesis is that an intermediate layer would eventually end up being equivalent to the docx format, so I've decided not to have any intermediate representation.

We convert docx to html and send it AI. When AI rewrites the HTML and it back, we diff the rewritten HTML against the docx's document.xml and make the modification. This is a simplistic explanation of it. There are a bunch of validations and processing going on.

Regarding the tracked changes/comments, we simply invent new HTML tags for those things e.g. <ins>, <del>, <commentRangeStart> and etc.