Ask HN: What's a good format to submit CSV data for LLMs?

Posted by JimsonYang 1 day ago

I need to submit like 1000 rows of data to an llm so I can ask it for trends within the data. If I use json, I check gpt tokenizer and thats like 40 tokens per row(cuz headers were being referenced everytime leading to inefficiency). Meaning 40k input, which definitely would put me in context rot(hallucination) territory. And I heard using csv was very inaccurate. Any suggetions

Comments

Comment by Leftium 18 hours ago

Instead of directly passing the CSV data to the LLM, have the LLM write a script that will read the CSV and output the trends. (Then it can run the script and report on the results.)

I wanted to figure out reasonable values for range of daily/hourly precipitation for https://weather-sense.leftium.com. Claude wrote a script to call the Open Meteo API to collect hourly stats for a few cities for an entire year (8000+ rows), then just reported the 80th, 90th, etc percentiles and recommended ranges.

Comment by mierz00 1 day ago

We analyse thousands of lines from a csv using an LLM. The only thing that worked for us was to send each individual line and analyse it one by one.

I’m not sure if that would work in your use case, but you could classify each line into a value using an LLM then hard code the trends you are looking for.

For example if you’re analysing something like support tickets. Use an LLM to classify the sentiment, and you can plot the sentiment on a graph and see if it’s trending up or down.

Comment by 1 day ago

Comment by eimrine 1 day ago

you can use good old algorythms to search your special trends. just ask LLM how to code them. any algo you might need is somewhere inside of Donald Knuth's books.

Comment by JimsonYang 1 day ago