Intern – Record Linkage Data Science
2025-05-16T09:00:01+00:00
Centre for Infectious Disease Research in Zambia ( CIDRZ )
https://cdn.greatzambiajobs.com/jsjobsdata/data/employer/comp_2324/logo/Centre%20for%20Infectious%20Disease%20Research%20in%20Zambia.jpg
https://www.cidrz.org/
FULL_TIME
Lusaka
Lusaka
10101
Zambia
Health Care
Management
2025-05-21T17:00:00+00:00
Zambia
8
Reporting to the Senior Data Technical Support Officer, the Intern will be trained to processing, linking, and analysing data using Python and specialized record linkage techniques. This involves cleaning and standardizing data, implementing quality checks, utilizing string and numeric matching algorithms to accurately link records across datasets, and documenting the linkage process. The intern will also perform exploration data analysis to derive insights and contribute to data-driven decision-making. Strong collaboration, communication, and documentation skills are essential to work effectively with the team and stakeholders while adhering to data sensitivity and confidentiality.
Key activities:
- Conduct data cleaning, standardization, and formatting using Python libraries such as Pandas and NumPy.
- Implement data quality checks and validation procedures to identify and rectify errors or inconsistencies.
- Utilize advanced record linkage techniques with Python libraries including Scikit-Learn, Splink, record linkage and Fuzzy, to identify and match records across multiple datasets.
- Configure and optimize linkage models to improve accuracy and efficiency.
- Document and maintain detailed records of linkage processes, parameters, and outcomes.
- Perform exploratory data analysis (EDA) using Python data analysis libraries to understand dataset characteristics and patterns.
- Collaborate with team members to extract actionable insights from linked datasets and support data-driven decision-making.
- Prepare comprehensive documentation outlining the record linkage methodology, including the implementation of Splink and fuzzy matching algorithms.
- Generate regular reports summarizing progress, challenges, and outcomes of record linkage activities.
- Collaborate with cross-functional teams, including software developers, analysts, and subject matter experts, to understand data requirements and project objectives.
- Communicate effectively with stakeholders to gather feedback, address concerns, and ensure alignment with project goals.
Qualifications
- Full Grade 12 certificate
- A Bachelor’s degree in Computer Science, Statistics, Artificial Intelligence, Data Science, or a related field.
- Proficiency in Python programming language, with experience in libraries such as Pandas, NumPy, Scikit-Learn, Statsmodels, Splink.
- Familiarity with the application of mathematical/statistical modelling, and deterministic and probabilistic string and numeric comparator/match algorithms, is advantageous.
- Strong analytical and critical thinking skills, and meticulous.
- Excellent communication and interpersonal skills, with the ability to work both independently and collaboratively in a team environment.
Conduct data cleaning, standardization, and formatting using Python libraries such as Pandas and NumPy. Implement data quality checks and validation procedures to identify and rectify errors or inconsistencies. Utilize advanced record linkage techniques with Python libraries including Scikit-Learn, Splink, record linkage and Fuzzy, to identify and match records across multiple datasets. Configure and optimize linkage models to improve accuracy and efficiency. Document and maintain detailed records of linkage processes, parameters, and outcomes. Perform exploratory data analysis (EDA) using Python data analysis libraries to understand dataset characteristics and patterns. Collaborate with team members to extract actionable insights from linked datasets and support data-driven decision-making. Prepare comprehensive documentation outlining the record linkage methodology, including the implementation of Splink and fuzzy matching algorithms. Generate regular reports summarizing progress, challenges, and outcomes of record linkage activities. Collaborate with cross-functional teams, including software developers, analysts, and subject matter experts, to understand data requirements and project objectives. Communicate effectively with stakeholders to gather feedback, address concerns, and ensure alignment with project goals.
Full Grade 12 certificate A Bachelor’s degree in Computer Science, Statistics, Artificial Intelligence, Data Science, or a related field. Proficiency in Python programming language, with experience in libraries such as Pandas, NumPy, Scikit-Learn, Statsmodels, Splink. Familiarity with the application of mathematical/statistical modelling, and deterministic and probabilistic string and numeric comparator/match algorithms, is advantageous. Strong analytical and critical thinking skills, and meticulous. Excellent communication and interpersonal skills, with the ability to work both independently and collaboratively in a team environment.
Full Grade 12 certificate A Bachelor’s degree in Computer Science, Statistics, Artificial Intelligence, Data Science, or a related field. Proficiency in Python programming language, with experience in libraries such as Pandas, NumPy, Scikit-Learn, Statsmodels, Splink. Familiarity with the application of mathematical/statistical modelling, and deterministic and probabilistic string and numeric comparator/match algorithms, is advantageous. Strong analytical and critical thinking skills, and meticulous. Excellent communication and interpersonal skills, with the ability to work both independently and collaboratively in a team environment.
JOB-6826fe912483f
Vacancy title:
Intern – Record Linkage Data Science
[Type: FULL_TIME, Industry: Health Care, Category: Management]
Jobs at:
Centre for Infectious Disease Research in Zambia ( CIDRZ )
Deadline of this Job:
Wednesday, May 21 2025
Duty Station:
Lusaka | Lusaka | Zambia
Summary
Date Posted: Friday, May 16 2025, Base Salary: Not Disclosed
Similar Jobs in Zambia
Learn more about Centre for Infectious Disease Research in Zambia ( CIDRZ )
Centre for Infectious Disease Research in Zambia ( CIDRZ ) jobs in Zambia
JOB DETAILS:
Reporting to the Senior Data Technical Support Officer, the Intern will be trained to processing, linking, and analysing data using Python and specialized record linkage techniques. This involves cleaning and standardizing data, implementing quality checks, utilizing string and numeric matching algorithms to accurately link records across datasets, and documenting the linkage process. The intern will also perform exploration data analysis to derive insights and contribute to data-driven decision-making. Strong collaboration, communication, and documentation skills are essential to work effectively with the team and stakeholders while adhering to data sensitivity and confidentiality.
Key activities:
- Conduct data cleaning, standardization, and formatting using Python libraries such as Pandas and NumPy.
- Implement data quality checks and validation procedures to identify and rectify errors or inconsistencies.
- Utilize advanced record linkage techniques with Python libraries including Scikit-Learn, Splink, record linkage and Fuzzy, to identify and match records across multiple datasets.
- Configure and optimize linkage models to improve accuracy and efficiency.
- Document and maintain detailed records of linkage processes, parameters, and outcomes.
- Perform exploratory data analysis (EDA) using Python data analysis libraries to understand dataset characteristics and patterns.
- Collaborate with team members to extract actionable insights from linked datasets and support data-driven decision-making.
- Prepare comprehensive documentation outlining the record linkage methodology, including the implementation of Splink and fuzzy matching algorithms.
- Generate regular reports summarizing progress, challenges, and outcomes of record linkage activities.
- Collaborate with cross-functional teams, including software developers, analysts, and subject matter experts, to understand data requirements and project objectives.
- Communicate effectively with stakeholders to gather feedback, address concerns, and ensure alignment with project goals.
Qualifications
- Full Grade 12 certificate
- A Bachelor’s degree in Computer Science, Statistics, Artificial Intelligence, Data Science, or a related field.
- Proficiency in Python programming language, with experience in libraries such as Pandas, NumPy, Scikit-Learn, Statsmodels, Splink.
- Familiarity with the application of mathematical/statistical modelling, and deterministic and probabilistic string and numeric comparator/match algorithms, is advantageous.
- Strong analytical and critical thinking skills, and meticulous.
- Excellent communication and interpersonal skills, with the ability to work both independently and collaboratively in a team environment.
Work Hours: 8
Experience in Months: 24
Level of Education: bachelor degree
Job application procedure
- Suitably qualified candidates are invited to apply. However, only shortlisted candidates will be contacted
- To apply for this job please visit www.cidrz.org.
All Jobs | QUICK ALERT SUBSCRIPTION