danial.shabbir DanielOX

Data Engineer/ Backend Engineer 🔨💻

DanielOX / Jazzcash parser.py

Last active January 14, 2024 12:38

JazzCash Parse Tid and Mobile Number Python

	import pandas as pd
	import re

	# uri of jazzcash archived messages

	jazzcash_file = "./jazzcash.txt"

	# Iterate over and filter recieved cash and spam messages

	with open(jazzcash_file) as f:

DanielOX / EasyPaisa Parser.py

Last active January 14, 2024 12:38

EasyPaisa Text Messages Parser in Python

	import pandas as pd
	import re

	# Path of Easypaisa archived message text file

	easypaisa_file = './easypaisa.txt'

	with open(easypaisa_file) as f:
	data = f.read()
	transaction_message = []

DanielOX / NLP_Feature_Extraction_SKLearn.py

Last active May 20, 2020 15:34

Natural Language Feature Extraction | Bag of Words (with | using) Scikit-Learn Python

	from sklearn.feature_extraction.text import CountVectorizer

	# corpus source [ https://en.wikipedia.org/wiki/Baseball ]

	corpus = """Baseball is a bat-and-ball game played between two opposing teams who take turns batting and fielding. The game proceeds when a player on the fielding team, called the pitcher, throws a ball which a player on the batting team tries to hit with a bat. The objective of the offensive team (batting team) is to hit the ball into the field of play, allowing its players to run the bases, having them advance counter-clockwise around four bases to score what are called "runs". The objective of the defensive team (fielding team) is to prevent batters from becoming runners, and to prevent runners' advance around the bases.[2] A run is scored when a runner legally advances around the bases in order and touches home plate (the place where the player started as a batter). The team that scores the most runs by the end of the game is the winner."""

	# Tokenize corpus into list of sentences beca

DanielOX / NLP_Feature_Extraction_NLTK.py

Last active May 20, 2020 15:22

Natural Language Feature Extraction | Bag of Words (with | using) NLTK Python

	import nltk
	import string
	from collections import defaultdict

	# Sample gutenberg corpus loaded from nltk.corpus

	corpus = " ".join(nltk.corpus.gutenberg.words('austen-emma.txt'))

	# Tokenize corpus into sentences