In this week’s episode, UNT Frisco Assistant Professor Abdulrahman Habib joins Scott in the studio to discuss the Advanced Data Analytics Program, how to use open data sets, and what to expect at the upcoming event DFW Open Data Day.
SHOW NOTES:
[01:30] Advanced Data Analytics Master’s program at UNT[03:32] What is DFW Open Data Day
[08:48] Where open data is published
[13:31] Examples of using open data
[14:45] DFW Open Data Day information
[16:18] Data sets and transportation
LINKS & RESOURCES:
- UNT Frisco
- University of North Texas
- DFW Open Data Day Info:
- Abdulrahman Habib: Bio | LinkedIn
Connect with Lifestyle Frisco on:
Transcript
Machine-generated.
Welcome to UNT Unplugged. I’m your host, Scott Ellis, and I am here with, he’s just got one name, like Prince. We’re going to call him Habib. Welcome to the show Habib. Thanks for joining us today. We’re going to talk a little bit about what you do with UNT in an event that you have coming up today.
Sure. Um, I teach advanced analytics and advanced analytics program at UNT Frisco and I’m here for the Open Data Day or DFW Open Data Day today we are hosting in a couple of weeks.
Awesome. So we’re going to talk about that in just a minute, but I want to start a little bit, uh, just by way of what you do at UNT and that. So you’re an associate professor.
I’m an assistant professor, assistant professor, yes. And you teach, uh, advanced data and analytics. So I teach a, a bit of stats and a bit of data visualization and some, some basic codings here and there, big data, uh, infrastructure and that engineering.
Very cool. So my wife is actually taking your class right now. Um, and I would not define her as a quantitative type of a quote unquote quants necessarily. But she’s getting a lot out of it and I think she’s enjoying it. I’ll be maybe a little bit of frustration at times, but, uh, fortunately she’s got a technology person there to help her out once in a while. She mentioned to you my anytime I do try to keep her sane with that respect. But no, it’s, it sounds like an interesting course and it’s definitely the kind of thing that I would be interested in taking. So if somebody is interested in learning more about data science in general and they’re interested in this master’s course at UNT, what do they have to look forward to? What are they going to learn while they’re there? Um, so far as the fall, they, um, this program is a master program.
Uh, you need to have a bachelor degree to apply to it. Um, it’s a very flexible, open type of program where it’s mainly geared toward practitioners, so people who are working full time, um, and we teach multiple topics or you will get introduced to many areas within that eScience. And then you will take it from there to develop yourself further and maybe get that internship board world. Can I industry in that area. So if you ever changing careers or if you are interested in just learning about this area within it or within a computer science, uh, this is your place. Um, again, this program is meant to be for practitioners, so not that much requirements or different courses to prerequisite. We go through the basic details for each of the single course. Most of the courses are hybrid and in a eight week type of courses.Uh, so it’s a bit condensed, but it’s to the point. And to give you, um, the headstart with any of the topics we have.
I think it’s, it’s, it’s cool that you guys have this at UNT because, um, in, in the, in, in all of the different companies I’ve worked with, data science and data scientists are becoming a pretty hot commodity right now. People that can figure out how to get their arms wrapped around all the big data that’s being produced and then actually go and make some kind of sense out of it so that the business can make decisions is just more and in demand. I don’t think that’s going away anytime soon. So definitely something I would encourage people to think about if they’re interested in that kind of thing.
Yeah, the program is super cool. We have group from different all walks of lives, uh, in the program and you will learn multiple things again, your get out of your comfort zone, that’s for sure.And many topics and many things you need to learn. But I think it’ll give you the basics to start with and it will accept any, any students from different backgrounds. We have from stats, good mystery to business, to even a healthcare. So good deal.
Yeah. Okay. So let’s change gears and talk a little bit about the event that’s coming up cause that’s really what we’re here to discuss today. So why don’t you tell us what is the event and when’s it happening and we’ll go from there.
Let’s start. What is open data. So open data is the government data that’s available for you for free. To give you a basic example, now everybody is using GPS. GPS is started as an open data where the system, the whole system started to be open for public after, um, after it started with the military then now everybody uses it.So similarly, now government are publishing their data about all of all operations. You can think about it from um, inspection, earth renters inspections where Yelp is using that data to tell you this, this place is nice or it’s, this place is clean and this is the rating for the inspection for this base. This is all open. If you’re using even Google maps today, you’re using open data in another way because cities are publishing their construction, they are permitting and all this information where other companies are scooping that information and embedded within their system. If you do tax or if you do charities. Many of those industry also are using a lot of open data to understand and to um, uh, analyze the, the systems and what’s available and what’s in there and how people interact and so on. So we are in this event trying to promote the use of Hoffman data through our work with the city. So we have the Sierra city of Dallas, city of Frisco, um, hopefully our LinkedIn as well. The North central Texas council of government, um, they’re all coming together to present the challenge. It’s a hackathon event. So they’re presented challenge and everybody there try to pick a challenge. We’ll have multiple ones and sometimes some groups come up with their own challenge as well because they opened that up. Portals are available and it also out of that end there. So you can bring your own challenge, your own data set and even your team and work in there. And um, in a, in a nice environment where everybody’s encouraged to learn and practice and make sense of what’s the that so we are trying to do good with our knowledge in information technology overall or coding. Now this event is not only for coders and not only for geeks. Um, this event is built to be for general public. So if you’re interested about that and you want to learn more, you can attend the event. We have couple of workshops going on to the side of the event. You will get to learn how to do something, how to use the open data portal, what is the portal, what do they publish in there, how I benefit from it in my business. Either real estate or you’re in the restaurant business or any other businesses or even individual. You want to follow up on your government, local government and see what they are doing and learn more about what’s happening in there. Uh, we will show you tools and tricks and different things to, um, evaluate or follow up on what’s available in their open data portal.
Wow. That is interesting because I think a lot of folks don’t realize how much data is available to them to use for free. Um, and, and the government publishes a lot of, a lot of data and a lot of, of content even that’s, that’s free for you to go out and use on your own. I know one of the, this isn’t really data specifically, but one example is a lot of the, um, if not all of the images that NASA captures, you know, you can go out there and you can download those, you can use them because quite frankly, as a taxpayer, you’ve already paid for them. Um, and so governments opened up a lot of that data. I didn’t know about the GPS, although that makes sense. I’ve used some, um, GIS data in the past, um, and I don’t, it’s geologic something or other graphical information, geographical information. Thank you very much. Um, in, in some three D rendering I did where we are creating landscapes. And you can actually go out and download data that will allow you to render, you know, the grand Canyon or something like that for, they’ve got the data for that.So it’s interesting.
Tons of data sets that are available from each of the local governments here at participating with us and other government agencies as well. Um, we have the County hopefully will be with us as well. So different groups are coming with different challenges. They want to prototype or proof of concept or check what can you do with such an available data sets or sometimes they have a challenge in the 10 transportation this year. And we have this challenge with transportation. We have the data. Can you do anything with it? Can you predict? Um, last year we had, in this building we had an event, which is a hack North Texas and we had a data set about, uh, from ways a traffic for David WUI area. We have a team who won that event, that um, machine learning algorithm to predict traffic or accidents within, within the traffic areas. Um, that wasn’t, of course it’s a small event. So the things that are just prototype, it’s not fully functional. Sure. Uh, but that gave them an idea of what can be done with this and I think they called themselves now they uh, took it from there and they’re developing different models to support the available operation they have. Um, so we are prototyping that technology or what’s available, what can you do with this, how can you benefit from that as I said, and if you’re also an entrepreneur and you want to build on it or do something with it, you’re looking for a team or looking for inspiration. This is one of the events to get you inspired in that area.
It sounds like an awesome event. Now it’s got me thinking up like what are things I could be doing that I could challenge as I could come and pose? Is there, is, is the open data published in an a common format or, and part too, if somebody wants to work with that data, is there a language that is preferred for manipulating it or generating reports, things like that?
Very good question. So first the open data portal is where the city put their data. So it’s an a, it’s an online portal. You go there and you will see their data sets. Usually those data sets are updated frequently either every day, every week, depends on the data set. Then the data sets are there in non proprietary format, so they’re are usually CSV text file so you can download them, do anything you want to do with them. And again, the rights for using it, it’s open so you can use it, reuse it, redistribute it, use it in your application or whatever you want to do in your business.And that’s open for you. So no, no question asked. So you didn’t need to ask anybody for permission. You didn’t need to follow up with any agencies or anybody to download the dataset. You can actually link to the data sets through the APIs. So if you know what you’re doing, you can mainly link to those data sets and benefit from them within your business without going open everything by yourself. That’s what other companies are doing when they built on top of open, available data sets.
Okay. And what about languages for working with those data sets? Is there anything particular that, no, it’s open.
So once you download the data set, it’s up to you what to do with it. We mentioned GIS for example, if you’re a GIS person and you want to do some, some analysis on the GIS system that’s open for you.If you want to download the text file for a class project, let’s say for our students that’s open, you can use it and try to find a use for that or do a project with it. I have a couple of students who are, we’ve been working with open data with me for the past couple of years. That’s their either thesis or dissertation project and many classes in UNT and in other university I’ve used that within their classes.
Okay. Very interesting. Um, I know one of the languages that does pop up a wan who just, I’m drilling down on this a little bit cause it’s of interest to me personally, um, is uh, Python correct? Isn’t that pretty common for,
Yeah. So Python and R and I need to say also Excel sheets. So, okay, those are open. So if you’re just good in Excel, you can do a lot of things in Excel.
So it was a super powerful tool. I think, you know, a lot of us use it in, in, in a very basic manner relative to what it’s capable of. So it doesn’t matter if you’re as fancy as going into models and building models in Erie or Python or your machine learning neural networks to do something that’s good, that that’s really cool. Or even if you’re learning and you’re doing just basic Excel or doing other other, even if you’re not technical at all. So we have a team who showed up, um, two years ago and one of the events, and they’re totally not technical. And they said, Hey, what do we do in such an event? So they brainstorm a challenge by themselves and they decided the open data portal is not easy to understand. Uh, at that time we had a Denton as our main challenge and we hosted the event for about five years earlier in Denton. Now we’re moving to make it DFW Open Data Day. Um, so they said, Oh, the portal doesn’t have a video, doesn’t if I’m intimidated with a number of data sets. Um, so they decided to make a video on using the open data portal for the city and we worked with the city later on after a couple of months. Now that that video is still hosted as an introduction for the open data portal for City of Denton.
Do you remember the name of that video? Is it on YouTube? It’s on the website. So if you go to the data, the city of denton.com and about us explain what is the portal, you will have the link for YouTube, uh, for the people who made that link available.
Excellent. Again, that’s data dots that city of denton.com [inaudible] dot com okay. Are there other links that you know off the top of your head? Places that people can go to? Look at some of the datasets that are out there. The easiest, which is data.gov. That’s the U S open data portal. And again, the city of Dallas. If you Google any city almost today, and you just say data, the name of the city regardless of their domain because domains change and some of them use gov, some of them that come, others or, but anyhow, if you use just Google it with the data or open data portal, you will, you will see their portal. Then once into the portal you will see what your city’s offering and some of them do a call for action. Um, I will do a shout out for Austin because this is one of the very good open data portals available, ton of information, ton of even research and things happening in there. Um, lots of school systems in the backend too.You can see so many things from scooters and movement and cars, um, to even trash collection and all the whole nine yards is available in, uh, in city of Austin that’s in Texas and they are, they’re doing very well. So pop up in there and see what’s in that website. How can you use it either for a class, for business, whatever you like.
Interesting.
Journalism journalism. That’s another example of using it in journalism. Yeah. So I will give you a very nice tricky example. And city of New York. So one of the journalists, or at least a guy in city of New York looked at the traffic citations for a specific traffic citation. He used to get in a very specific area. Then he narrowed that down that this type of traffic citation only happens in only certain spots and there is no um, signage. The right signage is not in there and the right color coding is not in there for that specific side, uh, citation. Um, so he arrived within the city, he presented all his finding through the open data portal, which is the data he found and then that’s been changed since then. So not only reporting things and looking at how the government is operating and what’s going on in there, but also you can make a change and you can promote what, what’s the change you want to do because you have the evidence and you have supporting evidence from the city data to support you and to back you up on rallying for your coats.
Interesting. I like that. So they probably put up the correct signage so that they can keep adding citations? Yes. Okay. So when is the event this year?
So the event is in March 7th at UNT Frisco.
Yeah. So it’s coming up quick.
Yes.
And is it one of the hall park locations or is it here in hall park?
In hall park. Okay. What do people need to know if they want to register, take part, just come out and check it out.
Google DFW Open data day. We have DFW, open data day.com as well. Uh, we have Eventbrite events so you can just RSVP there. The event is free and we will have a, of course food as well. So show up, bring your laptop. Don’t come without your laptop because we can do anything without it. So bring your laptop computer or whatever you want to work on. If you have an idea, that would be great. If you don’t just show up that day and you will see what we have. The challenge is coming from the city, the partnering city with us. And I’m sure you will get to know a lot of people and you get to network with them. It’s a fun day, fun day doing coding or fun day doing um, good with your, with your skills.
What, what time does it start? So we’ll start at nine up to four. Okay. So it’s almost half a day with a couple of breaks and it’s a, it’s a laid back type of environment. So you will get into teams after represent the challenges. Uh, we get into teams, each team work on their challenge. Then at the end of the day we wrap up and they present their findings and usually we have a sponsor to um, to sponsor some of the giveaways in there so that we just make it fun and people will walk out with some nice stuff in their hands too. I love it. Yeah.
And are there any, any challenges you know of already that people are bringing that you can talk about that if you can’t that’s okay. I just thought I’d ask you.
Yes. So, um, we are working with city of Dallas and hopefully dart and the other cities to make it more of transportation challenge.
Okay.
So we might end up having with a really good set data set about transportation, a movement and tracking within within the city or neighboring cities. And the challenge will be how to utilize this information. What can you do out of it? Maybe we’ll have a specific question on certain data sets between public transportation and private transportation and how can we do multimodal transportation? What type of challenges we have. We can also aggregate or hopefully we might aggregate some of the accidents and the traffic accidents and how to evaluate the causes with the accidents and maybe, um, make, um, an action list to help support, um, reducing that, uh, that number of accident.
Fantastic. Yeah, it seems like traffic, whether it’s Dallas, Frisco, or anywhere in between, is becoming more and more and more of an issue and a topic that people are wanting to and trying to tackle in different ways.
I think even better now we have more data. Yeah. So think about Uber, Lyft, even the scooters. They’re not only moving around, we have various specific data about them. And if you have this data now you can do things with it. So this is a new area. It’s a new promising area. Even even the, the streets fixtures nowaday are collecting that and you can aggregate some of this data to know more and to make an informed decision about what you, how you want to fix it or what are you going to do, um, that wasn’t available before. So this is a big difference between today and couple of years ago.
Yeah, I love it. I know in some of the conversations we’ve had in Frisco, um, and I’ve heard mayor Cheney alluded this several times in that, you know, once upon a time the thinking was if there’s more traffic, we just need more roads, we need more lanes. We need to expand all of that. And quickly learned that adding more lanes just means more lanes of congestion. Um, you know, like there’s a, there’s a highway I think in China that’s something ridiculous, like 20 lanes wide and it’s still just packed with traffic all the time. And that would clearly was not the solution they were looking for. Um, so they’re looking at innovative ways of, of how, how we get people moving around and inside of the city of Frisco, just to alleviate some of the, the traffic headache that already exists. They say build it and people will come and they do at least when it comes to try two roads for sure.
So yeah, so there are other things we can do from analyzing this data to looking at what’s in there, looking at the streets. So I have a students, um, is analyzing data from, um, city of San Francisco or in our California area, the Bay area. And we have a ton of data from sensors, from, uh, from the cars, from the DOD as well. And he’s doing different prediction model using, um, um, CNNs in predicting who, what, and what are the causes and how can you elevate even rerouting situations and how rerouting affect other roadways. So, so having the data is the first start to improve the process and see what we can do. And it’s again, trial and error. So those events will help prove a concept or build something that wasn’t before. I give you another example from last year in Dallas. We have a group who came up to the event. Um, we had couple of challenges but they came with their own, they live in downtown Dallas and they’re always annoyed with having a lot of events in the areas basically with the American airlines arena in there and they live nearby. So they built actually, um, they built Alexa skill to ask what are the events, when are the events are going on in Dallas in that area and what’s the traffic in that area. So they prepare themselves while they’re going out or in the house.
Fantastic. Yeah. So clever. Yeah. I love seeing the things that people do. You give them the data and let them let them loose and see what they come up with.
So that’s a hot promise from, from the events. So we are looking at something, it’s not that specific people that are coming from different walks of life and again, technical, non technical, and they’re all looking at what can we do with this? Can we improve it? What’s the challenge? Or coming up with their own challenge if they’re, if they want.
I love it. So we’re going to welcome all these people. Well, the Frisco for DFW day to day on March the seventh yes, kicking off at nine o’clock so we’ll link all this up in the show notes so you guys can easily find what you’re looking for. Go out there and be sure to register and hit that event. Bright link and sign up and we will see you on that Saturday. Thank you Emily. Dr Habib, thanks for joining us. It was a fun conversation and thanks to all of you for tuning into UNT Unplugged. We’ll talk to you next time.