Private.txt | 240k

The most relevant reference is a project by Edward Donner , where he fine-tuned a Large Language Model (LLM) on his own private history of 240,805 text messages to create a digital simulation of himself.

: The data consisted of SMS, iMessage, and WhatsApp conversations with 288 people. 240k private.txt

: The messages were cleaned by removing group chats and unknown contacts, then grouped into "chunks" of 200 tokens to serve as training prompts for the AI. The most relevant reference is a project by