How to Build a Machine Learning/Data Science Portfolio That Gets You Hired!

How to Build a Machine Learning/Data Science Portfolio That Gets You Hired!

Brief Summary

This video provides a framework for creating data science and machine learning portfolio projects that stand out to employers. It emphasises the importance of choosing topics you're passionate about, sourcing unique data, building end-to-end solutions, using industry-standard tools, and effectively communicating your work. The key takeaways are to showcase your problem-solving skills, creativity, and ability to translate technical work into business value, rather than just replicating generic projects.

  • Choose projects based on genuine interests to fuel motivation and add depth.
  • Use unique data sources like APIs, web scraping, or self-generated data.
  • Build complete end-to-end pipelines, including data collection, preprocessing, model deployment, and monitoring.
  • Use industry-standard tools like cloud platforms, Docker, and MLflow.
  • Communicate your work through well-documented code, blog posts, and interactive dashboards.

What should you build for your data science/machine learning portfolio?

The video addresses the common problem of data science and machine learning portfolios failing to impress employers due to their generic nature. It argues that simply replicating Kaggle projects isn't enough and that employers are more interested in seeing how you think, solve problems, and bring your personality into your work. The video promises to provide a framework for creating projects that will differentiate you from other candidates.

How to pick a topic for data science/machine learning projects

The most important step in creating a standout project is choosing a topic that genuinely excites you. This passion will fuel you through challenges, allow you to leverage existing domain knowledge, and enable your personality to shine through, making the project more memorable. An example is given of a mentee who combined his passions for data science and DJing to create a project that analysed his music collection and optimised playlist creation. While hobbies are a good starting point, the project can also be based on a professional interest, as long as you have a personal connection to the topic.

How to find data for data science/machine learning projects

To make your project stand out, you should avoid using pre-cleaned, common datasets like those from Kaggle competitions. Instead, you should consider using public APIs to gather fresh data, web scraping data from relevant sites, finding unusual government data sources or niche industry surveys, generating your own data through experiments or surveys, or combining multiple datasets in novel ways. The example of the DJ project is revisited, highlighting how the mentee used a library to extract features from his own music collection and combined it with metadata from his DJing software, creating a unique dataset. Another example is given of a project analysing neighbourhood walkability, where data could be sourced from Google Maps API, public transit data, and Open Street Map to create a custom walkability score. Sourcing original data demonstrates resourcefulness, creativity, and the ability to work with messy, real-world information.

Building an end-to-end solution

It's crucial to demonstrate the ability to build complete solutions, not just models. This involves creating an end-to-end pipeline that includes data collection and storage, data cleaning and pre-processing, feature engineering, model training and evaluation, model deployment, and analysis and presentation of the solution. For machine learning roles, it's beneficial to incorporate MLOps best practices such as containerisation, CI/CD pipelines, model versioning, experiment tracking, model drift monitoring, and automated retraining pipelines. The DJ project example is used again, detailing the full data science pipeline that was created. Another example is given of an app to diagnose plant diseases from smartphone photos, outlining the various components of an end-to-end machine learning pipeline, including data ingestion, model training, automated evaluation, deployment, and monitoring. This end-to-end approach shows an understanding of the full life cycle of a data product.

GiveInternet.org

A brief interlude discusses the lack of internet access for 3 billion people worldwide and promotes give.org, an organisation that provides internet access and laptops to students in underserved communities. The organisation aims to connect talented minds to education and opportunities, teaching skills like coding, design, and entrepreneurship to foster economic independence. Donations to give.org/gratitudedriven will be matched.

What tools should you use?

When building your portfolio, it's important to use the tools and technologies that real data scientists and machine learning engineers use in professional environments. This includes cloud platforms like AWS, GCP, or Azure, workflow orchestration tools like Airflow, containerisation with Docker, version control with Git, industry-standard machine learning frameworks like TensorFlow and PyTorch, and experiment tracking tools like MLflow or Weights and Biases. Using these tools signals that you understand the practical realities of data science and machine learning.

How can you share your portfolio work?

Technical skills alone aren't enough to get hired; communication skills are also crucial. To make your project accessible and impactful, create a polished GitHub repository with modular, well-documented code, write a compelling readme explaining the project and its importance, include clear setup and running instructions, and create interactive visualisations or dashboards. Share your work widely by writing detailed blog posts, sharing on platforms like LinkedIn, Twitter, Discord, and Reddit, creating videos for YouTube, or presenting at local meetups or conferences. Sharing your work increases its visibility to potential employers and collaborators.

Watch the Video

Share

Stay Informed with Quality Articles

Discover curated summaries and insights from across the web. Save time while staying informed.

© 2024 BriefRead