The Personas Project grew out of a brief vision paper I wrote in November, 2023. I was thinking about how I might use AI to alternatively share things I have learned over the years. I had been working on various books and outlines. I had outlined eight different book projects and amassed a trove of supporting documents, lectures and presentations I had written. And I thought, I think I'm going to need another lifetime to complete eight books. But I have all of this content and material that I wanted to include in the books. Well, what if I was able to put that content into a personal large language model, and then have a ChatGPT-like front-end to it, and allow people to pose questions? I could also include a list of recommended questions to get started, but then people could pose their own questions and have a conversation with my books, so to speak.
I started thinking about the two classes I taught, because they were during the pandemic, so everything was recorded on Zoom. I took all of those recordings each year, and I fed them into Otter.AI and generated transcripts, as I've been doing for my weekly student meetings since then. I fed the transcripts into a book draft. And for the rough draft for each class I had over a 450 page manuscript that has basically been generated from the lectures, the guest speakers, and the student discussions during the class. I had to go through and anonymize everything, of course, so that I'm protecting everybody's name. So I thought, wouldn't it be interesting to query those conversations?
And what about all of the work when I was a CIO? What about those presentations and documents that were written on IT strategy, NetHope and things like that. And then further back to prior jobs. I have 45 years experience in IT, and I'm a packrat; I saved everything, and I've got hundreds of documents. So I thought, well, why not have something where you can feed in all of that information and then chat with it?
And it actually solves two important problems that we have with some of the language models now. One is it reduces hallucinations, because it's just your information that's in there as the basis. It doesn't eliminate hallucinations. And you can see that in the transcripts that I can share on the testing of this. And it also solves the copyright problem because they're my documents, my information. And I thought, well, isn't this really a special case of what corporations want to do? They want to take their documents and allow internal and external conversations to flourish.
For example, take the help desk and the knowledge base that's produced from all of the help desk calls. Or in my former student's case, he's looking at all of the documentation that's available to patients and clients in a healthcare system. Can you put that into a model and have a chatbot to talk with that?
The former student sent me a note catching up at one point. I think it was a best wishes for Halloween note, if I remember right. He asked, was there anything I needed some help on? And I said, well, listen, I've been toying with this idea. And I sent him a copy of the vision paper and said, what do you think? Is this something interesting? And he said, yeah, he could see some value in this, even personally, because his wife's working on a PhD and she's gathering research in supply chain management. And wouldn't it be useful to help with summarizing research and providing query opportunities into that?
So I said to him, well, how about if we could do a minimum viable product, to show for this? So he developed an MVP to show how this could work. We then created a team of student volunteers from my Data4Good group, demoed it to them, and began a "what if" discussion. And the timing was great because my former student, in his investigation and discovery, found some products, like AnythingLLM and Ollama that had already provided the pieces for what we wanted to do and that we would need to assemble and integrate --all without any coding requirements!
He also came up with the concept of Personas. My concern was that some of my files would be appropriate for one audience, personal files for another, and classroom files for another. So he suggested creating three datasets and “HappGPT” chatbots for my different audiences, which I’ve since called “Ask the CIO,” “Ask the Professor,” and “Ask Grandpa.”
For two of these datasets, we’re releasing a test version on this Blog:
The test sites are limited to 100 queries per day, so don’t get carried away 🙂. Please leave some comments on what works well and not so well. I appreciate your feedback. Let the conversation begin!