Job Title :- Sr. Data Scientist
Duration :- 6+ Months
Location :- Raleigh, NC (Hybrid)
Visa :- NO H1B,CPT
Interview :- Virtual
The project:
- They are processing extreme amount of data
- Process 15,000 docs per minute. – THIS CANDIDATE NEEDS TO HAVE WORKED WITH AROUND THIS AMOUNT OF DATA
- Trying to process and extract enrichments from the documents
- They collect news from across the world, take those in real time and run an LP algorithm to do things like identity companies mentioned, product mentioned, etc. then link it to say this company was mentioned and link it to their records, or a person.
- Sentiment analysis – its complex.
- They want to know generally if an article is positive or negative about a certain person or company.
- It may be negative about one company and positive about another
- They want to be able to extract additional things – when did an event occur
- Last month? When was that? Need to be able to categorize based on the timeline.
Skills:
- Minimum 5 years of experience using NLP tools and methods such as OpenNLP, Stanford NLP, LDA, Gensim, spaCy
- Natural Language Processing (NLP): Relevant experience with NLP tools and techniques.
- Proficiency in Python: Familiarity with Python and relevant libraries.
Nice to have:
- Databricks: Experience with Databricks or similar platforms.
- Pyspark: Hands-on experience.
- Snowflake: Knowledge of this platform.
- spaCy and Gensim: Leveraging spaCy for Named Entity Recognition (NER) and Gensim for LDA topic modeling to provide contextual insights.
- PyTorch and Numpy: Proficiency in these libraries.
- Large Language Models (LLMs): Strong focus on LLMs and fine-tuning expertise.
- Cloud: Cloud experience is a bonus.
- ML Ops: Exposure to ML Ops tools and practices.
Thanks & Regards
Aman Mishra |Sr. Technical Recruiter
Desk : 215-258-8939
First Ring Solutions LLC