
Python Web scraping Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
Python Web Scraping Engineer - Pune/Chennai
Job Reference: R24_0000605
The proliferation of e-commerce has generated vast amounts of data, often difficult for companies to leverage effectively for strategic decision-making and impact measurement. Data Impact by NielsenIQ addresses this challenge by collecting, processing, and analyzing over 60 billion data points daily, transforming them into innovative monitoring and action tools. Our mission is to provide clients with real-time market visibility. Data Impact by NielsenIQ is a leader in the 'Retail Analytics' sector.
About the Role
As we experience rapid international growth, we are seeking talented individuals to join our dynamic and experienced team. Embrace a true startup spirit with significant career opportunities in a supportive and trusting environment that encourages autonomy and challenges.
Responsibilities
- Capture extensive data from web and mobile platforms, and design architectures for extraction, deduplication, classification, clustering, and filtering.
- Design and develop distributed web crawlers, capable of independently resolving development challenges.
- Research and implement algorithms for web page information extraction to enhance data capture efficiency and quality.
- Analyze and warehouse crawled data, while monitoring crawler systems and managing abnormal alarms.
- Develop data collection strategies and anti-shielding rules to optimize data acquisition efficiency and quality.
- Design and develop core algorithms aligned with system data processing flows and business requirements.
Qualifications
Must Haves:
- Proficiency in Python and experience with crawler frameworks like Scrapy or similar, with independent development experience.
- 1-15 years of relevant experience.
- Familiarity with vertical search crawlers and distributed web crawlers, deep understanding of web crawler principles, extensive experience in data crawling, parsing, cleaning, and storage, and mastery of anti-crawler technologies and solutions.
- Proficiency in basic Linux operations.
- Experience in distributed crawler architecture design, IP farms, and proxies is advantageous.
- A strong foundation in data structures and algorithms is preferred.
Good to Have:
- Familiarity with common data storage and various data processing technologies.
- Knowledge of common frameworks such as SSH, multi-threading, and network communication programming.
- Experience with at least one RDBMS and non-structured database technologies.
- Hands-on experience crawling e-commerce platforms is a significant plus.
Additional Information & Benefits
- Enjoy a flexible and rewarding work environment with peer-to-peer recognition.
- Access to wellness plans for you and your family.
- Financial wellness tools to help you plan your future.
- Opportunities for career development to stay relevant and upskill.
- Flexible working arrangements.
- Volunteer time off.
- LinkedIn Learning access.
- Employee Assistance Program (EAP).
Company
NielsenIQ
NIQ (formerly NielsenIQ) is a global leader in consumer intelligence, dedicated to providing unparalleled insights into consumer purchasing behavior and identifying new growth opportunities. In 2023, ...