spacy-rasa-basketball-xml
Date: 2025 09-September 19
what the heck is rasa? it is huge! is it really necessary for determining user intent?
Ah — yes, that’s a very fair reaction. Rasa is massive because it’s essentially a full-fledged chatbot framework, not just an intent classifier. It bundles:
-
NLU pipeline (tokenizers, featurizers, entity recognition, intent classification)
-
Dialogue management (deciding what the bot should say next)
-
Action server / custom actions
-
Training infrastructure, model serialization, and more.
For your project — where the goal is simply: “extract intent + entities from a short text describing grouping rules” — Rasa is overkill. You’re paying hundreds of MBs and dozens of dependencies just to classify one intent and maybe a few entities.
✅ Lightweight alternatives
Since you already use spaCy, you can implement a minimal intent + entity extraction pipeline without Rasa at all:
-
Intent classification:
-
Use a text classifier in spaCy (
TextCategorizer) -
Train it on a few examples (
group_data,other_intent, …)
-
-
Entity extraction:
-
Use spaCy’s NER (
EntityRuler) or PatternMatcher -
Or just simple regex/dictionary matching for group_level_1, group_level_2, entity_level
-
-
Data points:
- Continue using spaCy noun chunks or simple regex to detect metrics (
points,assists, etc.)
- Continue using spaCy noun chunks or simple regex to detect metrics (
Pros:
-
Lightweight (~50–100 MB vs 1–2 GB with Rasa)
-
Faster load times
-
Easier to deploy
-
Full control over patterns and labels
Cons:
-
Doesn’t scale as well for very complex dialogues
-
You have to manually curate patterns or training examples
Example: Minimal spaCy-based intent/entity extraction
import spacy
from spacy.pipeline import EntityRuler, TextCategorizer
nlp = spacy.load("en_core_web_sm")
# Intent classifier
if "textcat" not in nlp.pipe_names:
textcat = nlp.add_pipe("textcat")
textcat.add_label("group_data")
# Simple entity patterns
ruler = nlp.add_pipe("entity_ruler")
patterns = [
{"label": "group_level_1", "pattern": [{"LOWER": "team"}]},
{"label": "group_level_2", "pattern": [{"LOWER": "position"}]},
{"label": "entity_level", "pattern": [{"LOWER": "players"}]},
]
ruler.add_patterns(patterns)
doc = nlp("Group my players by team and then by position")
print("Entities:", [(ent.text, ent.label_) for ent in doc.ents])
-
This will detect the entities without Rasa.
-
For intent, you can train
TextCategorizeron 10–20 examples.
💡 Bottom line:
If your app only needs one intent (group_data) and a handful of entities, Rasa is not necessary. A pure spaCy solution is smaller, faster, and easier to maintain.
If you want, I can rewrite your current nlp_processor.py to remove Rasa entirely and use spaCy + patterns only, so you can run the app without downloading a giant Rasa stack. This would reduce dependencies to just Flask + spaCy.
About the Rasa output:
-
You got:
ERROR ... Cannot train 'DIETClassifier'. Need at least 2 different intent classes.That’s normal if your
nlu.ymlhas only one intent. Rasa requires at least 2 intents to train the classifier properly. You can ignore it for now if your goal is just entity extraction, or add more intents.
Recommended new structure:
/spacy-rasa-basketball-xml
│
├─ /data/ # All Rasa training data
│ ├─ nlu.yml
│ ├─ rules.yml # optional
│ └─ stories.yml # optional, if you plan to add dialogue
│
├─ /models/ # Trained Rasa models
│ └─ ... (auto-created tar.gz)
│
├─ /config/ # Rasa configuration files
│ └─ config.yml
│
├─ /domain/ # Rasa domain files
│ └─ domain.yml
│
├─ /app/ # Your application code
│ ├─ init.py
│ ├─ app.py # main FastAPI/Flask/etc. app
│ ├─ nlp_processor.py # wrappers around Rasa or SpaCy
│ └─ /templates/ # HTML/Jinja templates
│
├─ /thirdparty/ # Third-party LICENSEs (Rasa, Spacy, etc.)
│ └─ rasa_LICENSE.txt
│
├─ pyproject.toml
├─ poetry.lock
├─ README.md
└─ requirements.txt # optional, mirror pyproject.toml