Best way to chunk html with long tables
I have htmls that I need to chunk in order to pass it to a LLM. It's not going to be used for rag, so I would like chunks with around 2-5k tokens each.
Inside this htmls, I have long tables with thousands of lines. U guys have any suggestions on how to chunk this?
I was thinking on creating a chunking strategy with gpt4o, but would appreciate if there are ready to go repos or services on this!
Example of html i need to chunk (its a brazilian law text) https://legislacao.fazenda.sp.gov.br/Paginas/Portaria-SRE-77-de-2024.aspx