Seeing the Job Market with PythonIsaiah GeraldBlockedUnblockFollowFollowingFeb 16Long before your first day as a developer, before that initial interview, and even before you start sending out applications, there’s several crucial elements that must be worked out to define your path and setup your future career.
You have to navigate the mass of job titles out there to reveal the real dividing lines that differentiate the actual underlying careers in the current market, whittle down the swath tools to learn at your disposal to those that will ultimately be at the core of that career you chose, and then build a strong knowledge base around those tools to give you better perspective from what “sounds good” and “pays well” to what you can really see yourself doing for the rest of your working career.
Just to start this process can seem daunting with the millions of articles, resources, and opinions readily accessible to anyone in today’s internet driven environment.
Tackling this beast can be approached from just about any direction.
It could be a series of trial and error, testing the waters within each area and each platform, but that could also end up being a time consuming, and ultimately wasting, exercising.
One could attempt to go with their gut feeling and use a strictly emotional directive, pushing through any head winds and overcoming any obstacles, but this could land you in a job you never honestly wanted.
Finally, you could take a logical approach to the search for your passion.
Employing the tools of industry to build a data backed case for why you should pursue a specific career path or learn a set of distinct skills centered on a set of tools that will give you a real chance of landing a job.
While I’m not endorsing that this be the only reinforcement you use to make these big life altering decisions, utilizing the tools of the trade to yield better visibility of what’s out there can give you a feel for what you could be doing, allow you to see what chance you have of getting there, and can maybe even prevent you from making the wrong decision despite what you might hear, see, or feel.
Recently putting these words into action, I used a combination of Python and Excel to improve my awareness of what the current job prospects where like in several cities of my home State of Texas as well as Nationally.
This not only aided my understanding of key concepts such as concurrency and web scraping, but spared me a massive amount of time that I would otherwise have been spent clicking through endless links and taking down countless bits of data.
Starting with the key identifiers I would use to find this data, I looked at several websites to find one I could reference to build a list of job descriptions and one I could use to find a simplified list of programming languages.
Instead of just copying and pasting these terms to a list, I used a combination of Python’s Request and Pandas libraries with some quick untidy code to pull it from these websites and format it to my liking.
Function to build List of Job TitlesFunction to Build List of Programming LanguagesFrom there I laid out a detailed plan of how I would build the final data frame and what columns it would consist of.
I ultimately decided to construct the bulk of the data frame upfront, assembling all of the possible permutations that were possible based on the data.
These consisted of 4 different fields which included the language or job title, several possible salary ranges, the target city, and the state.
Despite focusing mainly on major cities near me in Texas I included the state field to capture a national level search just for comparison.
Since I was going to Indeed.
com as the source of my data, two additional fields served to hold possible variants the URL pattern could take to access the website.
To build this data frame I declared the variables ahead of time in a systematic manner to avoid confusion and easily devise a strategy.
this Included defining two possible patterns for each search type which comprised of the title or language combined with or without a base level salary estimate, with or without a target city, and with or without a state.
This address had to be formatted in a manner that could readily allow for the easy exchange of various options that coincided to variables in their respective adjacent fields.
Variable Declaration for Data FrameWith that done, building the data frame was somewhat easy.
Just like stacking bricks, it was built starting from a solid base which was worked from in a sequential fashion.
First the data frame header was established as a variable and then passed through a build function to construct blank fields within the data frame to populate.
The function first built each data frame field as list and then utilized these lists to define the shape of the blank data frame.
Then the data frame was populate with the appropriate list in each location, leaving one blank field to hold the job count for each search.
Initial Data Frame BuildTurning my focus to the core aspect of the project, I built the child function that was to be run concurrently to extract the number of jobs for each set of criteria.
Utilizing Python’s built in threading module to run this function concurrently was a crucial part to the time saving component of this project.
This was somewhat tricking though, due to the inherent number of possible searches.
To mitigate the possibility of opening plethora of concurrent request to access the data and overloading the system I placed several barriers of entry for the concurrently run functions forcing them into a hypothetical queue to wait.
To slow down this process of letting them gain access without their ability to jump in being hinder to any high degree seemed as though it might present some difficulties.
However, in the end applying a combination of simple logic statements and error handle to redirect bad links made achieving this relatively easy.
Child Function (Run in Parallel)To wrap up the main body of the script a parent function was built to manage the processing of concurrently run functions and pass the final job count list to the blank field in the existing data frame in order.
Within this function some more limits were placed on the number of concurrently running functions by incorporating nested while loops in the for loop to start the threads.
The index for each criteria set was also passed to the parallel functions and returned in the final list to allow for the sorting and correct alignment of jobs counts with each row.
Parent FunctionThe last thing to do at this point was tie it all together and watch the magic happen.
To do this the data frame construction function was passed to a variable and then ran through the parent function.
With the final tact on of an additional field in the data frame to better differentiate between job title and programming language, the data frame was then saved to an excel file for further evaluation.
All in all, it took 7 minutes to process around 3700 links and pull the number of jobs from each one, not bad in my opinion.
Final Data Extraction and StorageDespite having used Python’s Matplotlib and Seaborn libraries heavily in past to perform data visualization and truly enjoying the ability to customize and create incredibly stunning charts and graphs, the decision to use excel pivot tables was done to further save time.
If time permits further use of these tools may be employed on a pickle file saved from the data at the end of the run, but in any case here’s the final result.
Final Data in Excel SpreadsheetUsing this data I was able to see how many jobs are available for each of my chosen job titles and programming languages, at each of the chosen pay scales, and within each of my target locations.
Some of these results are shown below.
National Jobs by Job TitleAll Jobs found for Job Titles Selected with those that had no salary estimateJobs by Title with Salary estimates onlyJobs by position and city in TexasJobs by Programming Language in TexasJob Titles and Programming Languages come from the following sourcesJob Titles — https://www.
com/19-types-of-developers-explained/Programming Languages — https://www.
com/tiobe-index/Image Credit — https://www.