spacy-rasa-basketball-xml

Date: 2025 09-September 19

what the heck is rasa? it is huge! is it really necessary for determining user intent?

Ah — yes, that’s a very fair reaction. Rasa is massive because it’s essentially a full-fledged chatbot framework, not just an intent classifier. It bundles:

NLU pipeline (tokenizers, featurizers, entity recognition, intent classification)
Dialogue management (deciding what the bot should say next)
Action server / custom actions
Training infrastructure, model serialization, and more.

For your project — where the goal is simply: “extract intent + entities from a short text describing grouping rules” — Rasa is overkill. You’re paying hundreds of MBs and dozens of dependencies just to classify one intent and maybe a few entities.

✅ Lightweight alternatives

Since you already use spaCy, you can implement a minimal intent + entity extraction pipeline without Rasa at all:

Intent classification:
- Use a text classifier in spaCy (TextCategorizer)
- Train it on a few examples (group_data, other_intent, …)
Entity extraction:
- Use spaCy’s NER (EntityRuler) or PatternMatcher
- Or just simple regex/dictionary matching for group_level_1, group_level_2, entity_level
Data points:
- Continue using spaCy noun chunks or simple regex to detect metrics (points, assists, etc.)

Pros:

Lightweight (~50–100 MB vs 1–2 GB with Rasa)
Faster load times
Easier to deploy
Full control over patterns and labels

Cons:

Doesn’t scale as well for very complex dialogues
You have to manually curate patterns or training examples

Example: Minimal spaCy-based intent/entity extraction

import spacy
from spacy.pipeline import EntityRuler, TextCategorizer

nlp = spacy.load("en_core_web_sm")

# Intent classifier
if "textcat" not in nlp.pipe_names:
    textcat = nlp.add_pipe("textcat")
    textcat.add_label("group_data")

# Simple entity patterns
ruler = nlp.add_pipe("entity_ruler")
patterns = [
    {"label": "group_level_1", "pattern": [{"LOWER": "team"}]},
    {"label": "group_level_2", "pattern": [{"LOWER": "position"}]},
    {"label": "entity_level", "pattern": [{"LOWER": "players"}]},
]
ruler.add_patterns(patterns)

doc = nlp("Group my players by team and then by position")

print("Entities:", [(ent.text, ent.label_) for ent in doc.ents])

This will detect the entities without Rasa.
For intent, you can train TextCategorizer on 10–20 examples.

💡 Bottom line:

If your app only needs one intent (group_data) and a handful of entities, Rasa is not necessary. A pure spaCy solution is smaller, faster, and easier to maintain.

If you want, I can rewrite your current nlp_processor.py to remove Rasa entirely and use spaCy + patterns only, so you can run the app without downloading a giant Rasa stack. This would reduce dependencies to just Flask + spaCy.

About the Rasa output:

You got:

ERROR ... Cannot train 'DIETClassifier'. Need at least 2 different intent classes.

That’s normal if your nlu.yml has only one intent. Rasa requires at least 2 intents to train the classifier properly. You can ignore it for now if your goal is just entity extraction, or add more intents.

Recommended new structure:
/spacy-rasa-basketball-xml
│
├─ /data/ # All Rasa training data
│ ├─ nlu.yml
│ ├─ rules.yml # optional
│ └─ stories.yml # optional, if you plan to add dialogue
│
├─ /models/ # Trained Rasa models
│ └─ ... (auto-created tar.gz)
│
├─ /config/ # Rasa configuration files
│ └─ config.yml
│
├─ /domain/ # Rasa domain files
│ └─ domain.yml
│
├─ /app/ # Your application code
│ ├─ init.py
│ ├─ app.py # main FastAPI/Flask/etc. app
│ ├─ nlp_processor.py # wrappers around Rasa or SpaCy
│ └─ /templates/ # HTML/Jinja templates
│
├─ /thirdparty/ # Third-party LICENSEs (Rasa, Spacy, etc.)
│ └─ rasa_LICENSE.txt
│
├─ pyproject.toml
├─ poetry.lock
├─ README.md
└─ requirements.txt # optional, mirror pyproject.toml