Ask HN: What's a good format to submit CSV data for LLMs?
Posted by JimsonYang 1 day ago
I need to submit like 1000 rows of data to an llm so I can ask it for trends within the data. If I use json, I check gpt tokenizer and thats like 40 tokens per row(cuz headers were being referenced everytime leading to inefficiency). Meaning 40k input, which definitely would put me in context rot(hallucination) territory. And I heard using csv was very inaccurate. Any suggetions
Comments
Comment by Leftium 18 hours ago
I wanted to figure out reasonable values for range of daily/hourly precipitation for https://weather-sense.leftium.com. Claude wrote a script to call the Open Meteo API to collect hourly stats for a few cities for an entire year (8000+ rows), then just reported the 80th, 90th, etc percentiles and recommended ranges.
Comment by mierz00 1 day ago
I’m not sure if that would work in your use case, but you could classify each line into a value using an LLM then hard code the trends you are looking for.
For example if you’re analysing something like support tickets. Use an LLM to classify the sentiment, and you can plot the sentiment on a graph and see if it’s trending up or down.
Comment by eimrine 1 day ago
Comment by JimsonYang 1 day ago