• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To register, simply click the Register button at the top right.

Creating a an AI based project to provide answers from this forum

Messages
26
I thought it might be worthwhile to use AI to answer questions using all of the posts on this forum for inspiration. There are millions of posts on this forum and some good information might be lost in some earlier discussions.

I'm currently scraping every post from the forum. When a user asks the chatbot a question, first a few dozen of the most similar posts to the meaning of the question are retrieved using an embedding algorithm. Then the question along with those relevant posts are presented to the AI for summarizing and answering in natural language.

Here are some examples of it answering questions. Take note, it only has about the earliest 50% of posts on the forum.
Question: What are some comorbid conditions with CFS?
GPT-3.5's Response: Based on the provided forum posts, here are some comorbid conditions associated with Chronic Fatigue Syndrome (CFS):

1. Cancer: Some CFS patients may develop a cancer similar to one found in AIDS.
URL: https://forums.phoenixrising.me/threads/1/post-229984

2. Tuberculosis: CFS patients, like AIDS patients, may be especially susceptible to tuberculosis.
URL: https://forums.phoenixrising.me/threads/1/post-229984

3. AIDS Dementia: CFS patients have a brain defect similar to that found in AIDS dementia.
URL: https://forums.phoenixrising.me/threads/1/post-229984

4. Weight Fluctuation: Significant weight fluctuation can occur in CFS patients.
URL: https://forums.phoenixrising.me/threads/1/post-229984

5. Dizziness: Dizziness can be a severe problem in CFS.
URL: https://forums.phoenixrising.me/threads/1/post-229984

6. Chronic Sinusitis: Like AIDS, sinusitis can become chronic in CFS patients.
URL: https://forums.phoenixrising.me/threads/1/post-229984

7. Asthma: There is a suggestion that the rise in childhood asthma may be linked to the CFS epidemic.
URL: https://forums.phoenixrising.me/threads/1/post-229984

8. Endometriosis: Endometriosis seems to be common in women with CFS.
URL: https://forums.phoenixrising.me/threads/1/post-229984

9. Pregnancy Complications: Pregnancy can be a disaster for a CFS patient.
URL: https://forums.phoenixrising.me/threads/1/post-229984

10. Prostate Infections: Men with CFS can develop chronic prostate infections.
URL: https://forums.phoenixrising.me/threads/1/post-229984

11. Heart Murmur: CFS can cause a particular type of heart murmur.
URL: https://forums.phoenixrising.me/threads/1/post-229984

12. Bruising: Like AIDS patients, CFS patients may bruise more easily than healthy individuals.
URL: https://forums.phoenixrising.me/threads/1/post-229984

It's important to note that these associations mentioned in the forum posts are not universally agreed upon or supported by scientific research.

Question: how effective is graded exercise therapy?
GPT-3.5's Response: In response to the question about the effectiveness of graded exercise therapy (GET), there are mixed opinions among the forum posts. Some users express skepticism and concern about the potential harm of GET. They mention that many patients report worsening symptoms or adverse reactions after engaging in exercise programs. They argue that there is limited evidence supporting the effectiveness of GET, especially for severe cases of chronic fatigue syndrome (CFS).

Other posts mention studies and trials that suggest GET may be helpful for some patients, particularly those with milder or convalescent cases of CFS. They highlight the importance of a measured and gradual approach to exercise, with the support of a therapist. However, they also acknowledge that there is a lack of evidence showing sustained functional improvement from exercise in CFS.

Overall, the effectiveness of GET remains a topic of debate, with some patients reporting benefits and others experiencing negative effects. It is important for individuals to carefully consider their own condition and consult with healthcare professionals before engaging in any exercise program.

Question: is sunlight good for cfs?
GPT-3.5's Response: Based on the posts from the forum, there are mixed opinions regarding sunlight and its effects on individuals with Chronic Fatigue Syndrome (CFS). Some individuals reported experiencing negative symptoms, such as herpes outbreaks or feeling ill, after exposure to sunlight. They attributed this to the triggering of herpes virus infections or stress. Others mentioned that they felt better in colder temperatures and avoided sunlight due to sun sensitivity. However, there were also discussions about the potential benefits of sunlight, such as increased vitamin D production and its positive impact on mood. It is important to note that individual experiences may vary, and it is advisable to consult with a healthcare professional for personalized advice.

Sources:
1. Post discussing sun intolerance and herpes outbreaks: https://forums.phoenixrising.me/thr...-to-sunlight---not-just-the-heat.1/post-75518
2. Post mentioning sun intolerance and potential explanation: https://forums.phoenixrising.me/threads/Sulfur-Deficiency-and-Sun-Exposure.1/post-220499
3. Post discussing vitamin D and sun exposure: https://forums.phoenixrising.me/threads/Vitamin-D.1/post-912

I can't yet get it to answer exactly how I want, which is summarizing the posts and putting the links inline in the main text every time. In the second one it didn't feel like including links at all, even though I provide this prompt:

"You are provided with a question and relevant posts from a forum. It's crucial that you summarize these posts in your answer and give the urls to the posts as near to the summaries as possible. Example response: In one post, a user mentioned that they found soft drinks made their CFS worse. https://forums.phoenixrising.me/... In another post a user...https://forums..."

With some more tweaks I can probably make it accessible to the public, although I need to figure out if it's going to be too expensive. With the ChatGPT model I'm using and the amount of data I'm giving it with every question, it comes out to costing about 3 cents per question.

Any comments or suggestions on this idea?
 

hapl808

Senior Member
Messages
1,831
If you haven't already, you may want to experiment with using the cheaper models for the embedding and maybe GPT4 for the actual answers. It's so much more expensive, but it does seem to be able to synthesize information more effectively.

I feel like the whole forum might be enough to fine tune the model itself, but getting the data in the right format becomes challenging. That would likely give a better answer than embeddings, but I haven't used embeddings enough to really get them right. It always felt like regurgitating 2-3 search results, rather than really synthesizing them. I also mostly used GPT3.5 for that, though, because GPT4 is so much more expensive.
 
Messages
26
If you haven't already, you may want to experiment with using the cheaper models for the embedding and maybe GPT4 for the actual answers. It's so much more expensive, but it does seem to be able to synthesize information more effectively.

I feel like the whole forum might be enough to fine tune the model itself, but getting the data in the right format becomes challenging. That would likely give a better answer than embeddings, but I haven't used embeddings enough to really get them right. It always felt like regurgitating 2-3 search results, rather than really synthesizing them. I also mostly used GPT3.5 for that, though, because GPT4 is so much more expensive.
Yeah, I'm currently making embeddings with a locally installed model. I think GPT4 is out of the question for answers because of the price. (Somewhere in the range of 20 cents per question for the smaller input size version if I use just under its maximum input, which is less than I used in these examples.)

I read that embeddings are better if you want the AI to have the data more in its "short-term memory" so that it can answer questions more explicitly and direct you to the exact post. I think fine-tuning is more like changing its style of thinking.

But a lot of the embeddings are barely related, and are just a few words, so a lot of junk posts get sent and take up room. I'm thinking of maybe having an AI create a summary of each entire thread and using that for the embeddings.

I'm very new to this though, and plan to experiment with fine-tuning.
 

hapl808

Senior Member
Messages
1,831
Embedding are better for short term memory data and if you're providing links, but sometimes I wonder at the benefits of that. At a certain point, it's not really better than search - and sometimes it can be worse. The initial search algorithm is usually just word matching, so there's no AI involved. It's just like searching the forum and using the first five results.

It would be different if the AI could look at 20 embeddings and then decide which are the most useful, but I haven't really seen that (I'm pretty new to this stuff as well, though).

If you have enough power to run some of the better local models, maybe you can compare fine tuning vs embeddings. My feeling is fine tuning is enough of an improvement that it's still worth it despite the lack of linked source info. The advantage of fine tuning is it can use 1,000 posts on methylation to adjust its thinking, as opposed to trying to just pull 3 or 5 chunks to answer a question.
 
Messages
26
The initial search algorithm is usually just word matching,
From what I understand, embeddings are supposed to be based on the "meaning" of the text, and not explicit words. But yeah, the simple method I'm using is not giving great results. Probably 80% of posts on the forum provide very little useful information, but since there's so many, a lot of them end up getting matched to the question.

But good points. It's not super useful if I can only present five good posts with a query for the AI to analyze. I'm trying to get answers that kind of use all the accumulated information.
 

hapl808

Senior Member
Messages
1,831
It's an area I'm super interested so feel free to post any thoughts. That was my experience that the embeddings ended up just the words rather than deeper meaning, but some of that may also be because often the embeddings are usually created with the simplest models (GPT3, etc) and then queried with better models. I was never totally clear if the sophistication of the model making the embeddings mattered, but most guidelines seem to use the very basic ones for cost, etc.
 
Messages
26
So I'm thinking of taking a different path. After some brainstorming with ChatGPT, I imagine a program that you can feed millions of posts from all over the web to, and it uses machine learning to find posts where people proclaim with high confidence that they found a treatment that cured or nearly cured them. Then maybe the data from those selected posts can be aggregated to find treatments that may have been lost in the depths of forums.

It seems like I need to find at least a hundred or thousand posts to fine-tune an AI, where someone expresses being highly confident that they found a treatment. It would be trained to look for instances of someone being highly confident that a treatment worked for them (needs to be about themselves to avoid the noise of hearsay) where the condition and treatment are mentioned.

And I'd annotate the training posts kind of like this:
"[FIRST_PERSON]I had severe [CONDITION]migraines[/CONDITION] for years, and nothing seemed to work. Then I tried [TREATMENT]grounding, where I walk barefoot on grass for 30 minutes every day[/TREATMENT]. I can [HIGH_CONFIDENCE]honestly say I've never felt better[/HIGH_CONFIDENCE]. My [CONDITION]migraines[/CONDITION] have reduced by 80%, and I'm [HIGH_CONFIDENCE]convinced this is a game-changer for me[/HIGH_CONFIDENCE].[/FIRST_PERSON]"

I envision it being able to be used for any condition.

I'm currently manually checking every post here that has the word "cured", which isn't super efficient. I know I've had luck searching Google for the name of a condition plus the word "cured" and maybe "forum" and finding posts like that.
 
Last edited:

hapl808

Senior Member
Messages
1,831
That's an interesting approach. Possibly GPT3.5 API could be used to download those posts into a JSON file or something so you can then use that to fine-tune?
 
Messages
26
That's an interesting approach. Possibly GPT3.5 API could be used to download those posts into a JSON file or something so you can then use that to fine-tune?
Well, it's mainly going to have to be me manually finding these posts all over the internet, saving them into a file and annotating them with those [LABELS] and using that to train it what is and is not an example of one of these treatment posts.

I'll probably also try to pursue the separate project you suggested of just feeding a language model every post from the forum. Which would train it to speak in a way similar to the language used here. It wouldn't have access to specific posts or details, but might still be useful in some way.
 
Messages
26
I've been working on a new, somewhat related, project. I describe it in this blog post. Basically, I'm trying to train an AI (well multiple AIs) to identify people saying that something worked really well for their condition. It would be run across every Reddit comment, every Phoenix Rising post, and other forums. Then the data would be organized so you could see how many people, for example, found relief from D-Ribose for PEM. And you could look at every post where someone said that.

And it would eventually be expanded to look for people saying what did *not* work for them, as well.
 

hapl808

Senior Member
Messages
1,831
I've been working on a new, somewhat related, project. I describe it in this blog post. Basically, I'm trying to train an AI (well multiple AIs) to identify people saying that something worked really well for their condition. It would be run across every Reddit comment, every Phoenix Rising post, and other forums. Then the data would be organized so you could see how many people, for example, found relief from D-Ribose for PEM. And you could look at every post where someone said that.

This is incredible - and I have to say a worthy pivot from the initial project which sounded less immediately useful and somewhat difficult to implement.

This sounds challenging (definitely biting off more), but also very exciting and immediately useful if it works. How many of us search, "d-limonene acid reflux reddit" and start reading anecdotal reports, or search through the forum here for "bone conduction" to see if anyone tried it.

I'm impressed how much you could do as well. With limitations, that's fantastic work. I don't think I could even write that full blog post, let alone the amount of time to manually label, etc.

Couple random thoughts - and keep in mind I have no clue what I'm doing, so might or might not be helpful.

Rather than BERT for classification after the initial pruning, have you thought about using something like Mistral 7B that you could use locally and would run pretty quickly? Seems to have surprisingly decent performance out of the box, and with fine tuning you might be able to get even better performance on automatic classification of stories.

(Also curious - how many images did you train the car classifier? ResNet? It's a neat idea and I notice it's accurate, but not very granular. So I'm guessing not a ton of images, but still has really good performance?)

I hadn't heard of fastText. Amazing how many good, free resources that Meta provides. I've used Prophet for some prediction models and it's incredibly easy to code and outperforms lots of fancy models. Before I got into this, I wouldn't have guessed Meta would be providing lots of resources like this.

I think I'm already forgetting some of what I read. Oh, when I use GPT4 for coding help, I find that custom instructions and breaking things down into clearly named functions can help when the context window is closing. You can tell it to make assumptions about external routines, make your own import statements with comments, etc. This is how I've gotten it to help me code longer things.

And it would eventually be expanded to look for people saying what did *not* work for them, as well.

I think more useful is to also look at what was harmful, rather than didn't work. For an actionable use, most don't care that much if something didn't work for some people as much - that's pretty common, although the ratio is interesting. However, we deeply care if someone says, "It permanently lowered my baseline."

Anyways, exciting project!
 
Messages
26
This sounds challenging (definitely biting off more), but also very exciting and immediately useful if it works. How many of us search, "d-limonene acid reflux reddit" and start reading anecdotal reports, or search through the forum here for "bone conduction" to see if anyone tried it.
Exactly! Just today I was searching the forums for mentions of cumin helping, apart from Hip's list. Scouring through search results is not really efficient.
I'm impressed how much you could do as well. With limitations, that's fantastic work. I don't think I could even write that full blog post, let alone the amount of time to manually label, etc.
I'm definitely feeling PEM from writing for like five hours straight yesterday! But I was too excited to get it finished and wanted it done before bed.
Rather than BERT for classification after the initial pruning, have you thought about using something like Mistral 7B that you could use locally and would run pretty quickly? Seems to have surprisingly decent performance out of the box, and with fine tuning you might be able to get even better performance on automatic classification of stories.
I think Mistral might be mainly for generative AI, like chatbots. BERT is more for classification. But I'll definitely be researching the best models to use when the time comes. There are lots of other classification models, BERT is just one of the original ones.

In any case, any model will need to be run on a computer with GPU. To go through millions of posts on my laptop's CPU would be excruciatingly slow. But there are lots of options for running models on GPU cloud servers, starting at around $0.60 per hour.
(Also curious - how many images did you train the car classifier? ResNet? It's a neat idea and I notice it's accurate, but not very granular. So I'm guessing not a ton of images, but still has really good performance?)
I think around 200 per each of the years I used. From 1940 to 2020 in multiples of 5. Each year had about 15 images each of about 20 makes of car. So like 1800 total pictures.

It really is amazing. In the AI course I'm doing, the first demonstration is making a brown bear, black bear, teddy bear classifier, and it only takes 150 images of each to get it about 97% accurate.
I think more useful is to also look at what was harmful, rather than didn't work. For an actionable use, most don't care that much if something didn't work for some people as much - that's pretty common, although the ratio is interesting. However, we deeply care if someone says, "It permanently lowered my baseline."
Yeah, the ratio is what I was thinking would be helpful. But absolutely, harmful results would be good to include, too.
 

hapl808

Senior Member
Messages
1,831
I think Mistral might be mainly for generative AI, like chatbots. BERT is more for classification. But I'll definitely be researching the best models to use when the time comes. There are lots of other classification models, BERT is just one of the original ones.

I think Mistral is often trained as a chatbot, but I think it can also do classification. But I'm really not that familiar with classification - and maybe I've just heard the hype about Mistral 7B. I've never used BERT, so it may be much better for this task.

In any case, any model will need to be run on a computer with GPU. To go through millions of posts on my laptop's CPU would be excruciatingly slow. But there are lots of options for running models on GPU cloud servers, starting at around $0.60 per hour.

If you're using Colab already, maybe just upgrade to the Pro version? It's easy to switch back and forth between CPU and GPU, and the GPU is quite cheap if you're using a T4 and you can bump to an A100 if you don't mind the extra cost. I think T4 runs about $0.20 and an A100 about $1.00. Much easier if you're already using Colab than trying to figure out some of the other AI cloud services, although they may have other advantages. I love Colab's connection with persistent Google Drive storage.

I think around 200 per each of the years I used. From 1940 to 2020 in multiples of 5. Each year had about 15 images each of about 20 makes of car. So like 1800 total pictures.

That makes sense! I didn't notice it only had multiples of 5, but that explains the granularity. When I put a 1957 it would say 1955 or 1965 maybe. Amazingly accurate when you consider how many specific car types and such that it hasn't seen but can still likely classify.

Do you like the AI course? I've done the Stanford / Coursera Andrew Ng courses that I really like, but his are very theory heavy and less project-based.
 
Messages
26
If you're using Colab already, maybe just upgrade to the Pro version? It's easy to switch back and forth between CPU and GPU, and the GPU is quite cheap if you're using a T4 and you can bump to an A100 if you don't mind the extra cost. I think T4 runs about $0.20 and an A100 about $1.00. Much easier if you're already using Colab than trying to figure out some of the other AI cloud services, although they may have other advantages. I love Colab's connection with persistent Google Drive storage.
Yeah, I'll explore the Colab option. Although I was a bit confused about their pricing. It says $9.99/month for Pro, which is 100 compute credits, and people were saying that didn't last very long. I hadn't seen a per hour price, but I'll look for that.
Do you like the AI course? I've done the Stanford / Coursera Andrew Ng courses that I really like, but his are very theory heavy and less project-based.
Yeah, this is basically the opposite, which makes it really fun. The very first thing he shows is how to get an image classifier trained and running. And the lessons slowly go deeper and lower level. Never really going into the complicated math underlying AI, since he says for most practical AI projects it's not really useful to understand that.

The homepage of the course gives a good description of what it's about.
 

hapl808

Senior Member
Messages
1,831
Yeah, I'll explore the Colab option. Although I was a bit confused about their pricing. It says $9.99/month for Pro, which is 100 compute credits, and people were saying that didn't last very long. I hadn't seen a per hour price, but I'll look for that.

It's such a weird confusing system. Here's my understanding. CPU only is around 0.08 compute units per hour, T4 GPU is 2.00, A100 is 13.00. So you can do the math on how long compute units last - depends which environment you spin up. Extra RAM is also a small bump, but relatively minimal.

Yeah, this is basically the opposite, which makes it really fun. The very first thing he shows is how to get an image classifier trained and running. And the lessons slowly go deeper and lower level. Never really going into the complicated math underlying AI, since he says for most practical AI projects it's not really useful to understand that.

Interesting - thanks. That sounds very worthwhile. I did really like Ng's course, but it's heavy on theory, linear algebra explanations of compute graphs, how and why back propagation works, calculus chain rule, etc. Combined with some of Andrej Karpathy's videos it gives extremely good understanding of why this stuff works, but not as much application. GPT4 is great though at implementing things where I know the theory but not how to code it.
 
Messages
26
Interesting - thanks. That sounds very worthwhile. I did really like Ng's course, but it's heavy on theory, linear algebra explanations of compute graphs, how and why back propagation works, calculus chain rule, etc. Combined with some of Andrej Karpathy's videos it gives extremely good understanding of why this stuff works, but not as much application. GPT4 is great though at implementing things where I know the theory but not how to code it.
That all sounds super interesting. Unfortunately my low mental bandwidth doesn't currently allow me much room for learning all the things I find interesting. I have an ever increasing list of books from all sorts of topics, but I can barely read more than a few pages a day.
 

Alvin2

The good news is patients don't die the bad news..
Messages
2,892
I hope you did not scrape any members only forums for the AI.
If you did then at least let us know you did if its too late to delete it.
 

hapl808

Senior Member
Messages
1,831
That all sounds super interesting. Unfortunately my low mental bandwidth doesn't currently allow me much room for learning all the things I find interesting. I have an ever increasing list of books from all sorts of topics, but I can barely read more than a few pages a day.

I have the same issue. I work on a lesson for 30 mins, then have to recover for a day or two. Very frustrating because there's a lot I'd like to do, but I have no way to decide what to focus on when my energy is so limited. I'm not sure there's much I can do for 30 mins every couple days that will lead to profitable work, so…

I used to be able to read a book every 2-3 days. Now it takes me a month of consistent reading. And if I crash, then I can't do anything. If I were disabled but could work on interesting things without constant crashing, life would be a lot better. Still haven't found anything that allows me to do that.