OSINT: Dealing with Propublica Data

iOS
2 min readNov 16, 2018

--

I really enjoy working with data retrieved from a API the problem that I have encountered is that data is frequently dirty and unstructured and must be reworked before it is usable. After reading this guide you will understand the steps that I took to clean my data.

I have been working with the Propublica API to retrieve political data about legislators and bills they have sponsored, after reading the propublica documentation I noticed I could get specific information about a legislator by passing a member_id as a URL parameter. The response according to the documentation will give me data about a legislator roles, committees, and subcommittees, which is exactly what I need.

Please check out: https://projects.propublica.org/api-docs/congress-api/members/#get-a-specific-member

To extract the roles, committees and subcommittees from the response object, I need the following:

  1. Model file to represent the API JSON structure

2. Client file to make the rest call and map the JSON to the model fields

When I tried to parse the subcommittees from the response I kept on getting a null pointer exception, which made me question the accuracy of my data and decided to further investigate.

  1. I looked at the response object (String result in above photo), by doing this I was able to see exactly what fields where actually present and noticed subcommittees were empty.
  2. Next, I wondered if the social media names were valid and up to date. After taking twitter names and searching for them on twitter I did noticed that a few politicians did have multiple twitter accounts. This represents a new problems that I need to find a solution for.
  3. Lastly, I started looking for null values and handled them accordingly.

After dealing with dirty data I was able to get all the necessary information I needed from propublica. I understand how important the data cleaning process is and how unstructured and inconsistent data can and will lead to misleading results.

Thanks again for reading my article, next article will be over how I used the fec_id from propublica to retrieve financial information from maplight API.

Please checkout the below links

Resume website — https://tommarler.org

Linkedin — https://www.linkedin.com/in/tom-m-bb4857112/

If you like data checkout the programming historian

Programming Historian: https://medium.com/@tommarler/osint-the-programming-historian-1d9129439898

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

iOS
iOS

Written by iOS

iOS Developer, Go, Java, C#, Blockchain enthusiast, Data junkie

No responses yet