spacy-rasa-basketball-xml

Date: 2025 09-September 19

what the heck is rasa? it is huge! is it really necessary for determining user intent?

Ah — yes, that’s a very fair reaction. Rasa is massive because it’s essentially a full-fledged chatbot framework, not just an intent classifier. It bundles:

For your project — where the goal is simply: “extract intent + entities from a short text describing grouping rules” — Rasa is overkill. You’re paying hundreds of MBs and dozens of dependencies just to classify one intent and maybe a few entities.


✅ Lightweight alternatives

Since you already use spaCy, you can implement a minimal intent + entity extraction pipeline without Rasa at all:

  1. Intent classification:

    • Use a text classifier in spaCy (TextCategorizer)

    • Train it on a few examples (group_data, other_intent, …)

  2. Entity extraction:

    • Use spaCy’s NER (EntityRuler) or PatternMatcher

    • Or just simple regex/dictionary matching for group_level_1, group_level_2, entity_level

  3. Data points:

    • Continue using spaCy noun chunks or simple regex to detect metrics (points, assists, etc.)

Pros:

Cons:


Example: Minimal spaCy-based intent/entity extraction

import spacy
from spacy.pipeline import EntityRuler, TextCategorizer

nlp = spacy.load("en_core_web_sm")

# Intent classifier
if "textcat" not in nlp.pipe_names:
    textcat = nlp.add_pipe("textcat")
    textcat.add_label("group_data")

# Simple entity patterns
ruler = nlp.add_pipe("entity_ruler")
patterns = [
    {"label": "group_level_1", "pattern": [{"LOWER": "team"}]},
    {"label": "group_level_2", "pattern": [{"LOWER": "position"}]},
    {"label": "entity_level", "pattern": [{"LOWER": "players"}]},
]
ruler.add_patterns(patterns)

doc = nlp("Group my players by team and then by position")

print("Entities:", [(ent.text, ent.label_) for ent in doc.ents])

💡 Bottom line:

If your app only needs one intent (group_data) and a handful of entities, Rasa is not necessary. A pure spaCy solution is smaller, faster, and easier to maintain.


If you want, I can rewrite your current nlp_processor.py to remove Rasa entirely and use spaCy + patterns only, so you can run the app without downloading a giant Rasa stack. This would reduce dependencies to just Flask + spaCy.


About the Rasa output:


Recommended new structure:
/spacy-rasa-basketball-xml

├─ /data/ # All Rasa training data
│ ├─ nlu.yml
│ ├─ rules.yml # optional
│ └─ stories.yml # optional, if you plan to add dialogue

├─ /models/ # Trained Rasa models
│ └─ ... (auto-created tar.gz)

├─ /config/ # Rasa configuration files
│ └─ config.yml

├─ /domain/ # Rasa domain files
│ └─ domain.yml

├─ /app/ # Your application code
│ ├─ init.py
│ ├─ app.py # main FastAPI/Flask/etc. app
│ ├─ nlp_processor.py # wrappers around Rasa or SpaCy
│ └─ /templates/ # HTML/Jinja templates

├─ /thirdparty/ # Third-party LICENSEs (Rasa, Spacy, etc.)
│ └─ rasa_LICENSE.txt

├─ pyproject.toml
├─ poetry.lock
├─ README.md
└─ requirements.txt # optional, mirror pyproject.toml