Iterative Strategies for a Neural Net-based Search Solution
As companies grow and scale their operations, they often face challenges in improving and expanding their search solutions. In this post, I'll share my experience scaling a neural net-based search solution from just two websites to over 100, highlighting the challenges and strategies at each stage of growth.
Background
Before we dive in, let me introduce myself. I'm Mary Loubele, holding a Master's in Computer Science and a PhD in Medical Image Computing. I've worked as a Machine Learning Architect at Mappedin and a Data Growth Coach at Communitech, specializing in scaling prototypes to production. I've also organized data meetups in the KW region.
The Initial Problem
Imagine you're a SaaS company maintaining websites for various clients. You track core analytics like search terms, results, and selections. Your goal is to improve the search functionality beyond simple fuzzy matching. For instance, a neural net-based search should be able to direct a user searching for "jo" to the careers page, understanding the intent behind the query.
The Initial Prototype
Our first prototype leveraged session analytics stored in a NoSQL database. We built out the search solution for two major websites using LSTMs (Long Short-Term Memory networks) and manual effort for bias reduction. This approach worked well for a small scale, but how could we expand it?
Scaling Strategies
2 to 20 Websites (Startups in a City)
Team: Architect, Developer, QA intern, Front-end dev
Challenges:
Insufficient analytics data for some websites
Need for link name normalization
Building and deploying a search server
Handling links with limited search data
Manual threshold determination for biased results
Manual coding of joins in NoSQL datastore
Building the front-end
20 to 90 Websites (Startups and Hair Salons in a City)
Team: Architect, Developer, Automation Developer
Challenges:
Scaling server infrastructure
Automating threshold determination for biased results
Continued issues with data scarcity and link normalization
90 to 100+ Websites
Team: Architect, Developer, Automation Developer, Machine Learning Specialist
Challenges:
Potential cost issues with server infrastructure
Need for more advanced link name normalization
Reassessment of the search server architecture
Reevaluation of automated threshold determination
Beyond 100 Websites
Team: Architect, Developer, Automation Developers, Machine Learning Specialists
Solutions:
Reassess the need for fancy search on all websites
Implement more advanced link name normalization
Re-engineer the search server
Readjust infrastructure and reassign models
Switch from NoSQL to SQL Data Warehouse with STAR schema
Key Takeaways
Scaling incrementally (2 -> 20 -> 90 -> 100 -> 100+) is easier than trying to jump from 2 to 100+ all at once.
Don't try to automate everything simultaneously.
Balance the opportunity cost against automation costs.
Ensure you can continue delivering results throughout the scaling process.
Remember, scaling a neural net-based search solution is a journey. Each stage presents unique challenges, but with the right team and approach, you can successfully expand your solution to serve hundreds of websites effectively.
If you have any questions or want to discuss this further, feel free to reach out at mary@1936.ca.