Post

Visualizzazione dei post da marzo, 2026

A Knime component to unzip files

I share a simple component for KNIME to unzip files into a selected directory. KNIME is a true goldmine of nodes. Some are part of our daily workflows and feel almost “obvious”. Others are used more occasionally and often remain under the radar. The interesting part? When you have a clear idea of what you want to achieve, it’s usually just a matter of searching a bit — chances are, a node (or component) already exists to do exactly that. You can download the component here:  https://hub.knime.com/gio_bi/spaces/Public/unzip~xFugD01RdDaXInyV/current-state

LumberChunker: Long-Form Narrative Document Segmentation - ML.CMU

LumberChunker is a method leveraging an LLM to dynamically segment documents into semantically independent chunks. It iteratively prompts the LLM to identify the point within a group of sequential passages where the content begins to shift. Long-form narrative documents usually have an explicit structure, such as chapters or sections, but these units are often too broad for retrieval tasks. At a lower level, important semantic shifts happen inside these larger segments without any visible structural break. When we split text only by formatting cues, like paragraphs or fixed token windows, passages that belong to the same narrative unit may be separated, while unrelated content can be grouped together. This misalignment between structure and meaning produces chunks that contain incomplete or mixed context, which reduces retrieval quality and affects downstream RAG performance. For this reason, segmentation should aim to create chunks that are semantically independent, rather than relyin...