Skip to content

Instantly share code, notes, and snippets.

View diogobaltazar's full-sized avatar

diogo diogobaltazar

  • Novo Nordisk
  • United Kingdom
View GitHub Profile
@diogobaltazar
diogobaltazar / test.py
Last active December 16, 2019 14:46
Transform columns with condition on rows
> df = spark.createDataFrame(
[(1, 0), (3, 0)],
("a", "b")
)
> transf_column(df, F.col('a') + F.col('a'), 'a').show()
+---+---+
| a| b|
+---+---+
| 2| 0|
| 6| 0|
// Object Property Value Shorthand
let cat = 'Miaow';
let dog = 'Woof';
let bird = 'Peet peet';
let someObject = {
cat,
dog,
bird
@diogobaltazar
diogobaltazar / readme.md
Created June 29, 2019 11:49
one-to-many, many-to-one, many-to-many, Django ORM

ORM | Django

one-to-many, may-to-one, many-to-many relations in Django

@diogobaltazar
diogobaltazar / py.py
Last active June 20, 2019 22:48
Args and Kwargs
>>> def f(fst, *rest): # usually *args
... for this in rest:
... print(this)
...
>>> f(1, 2, 3, 4)
2
3
4
@diogobaltazar
diogobaltazar / udfs.py
Last active September 16, 2021 03:13
PySpark UDF | decorators | currying | map, filter, reduce
# udfs are applied to col elements, not to cols
# but they take col as args (pyspark.sql.Column)
# and return (pyspark.sql.types)
from pyspark.sql import functions as F
>>> def f(c1, c2):
return str(c1) + str(c2)
>>> fu = F.udf(f, StringType())
>>> df = spark.createDataFrame([(1, 'a'), (1, 'b'), (2, 'd')], ['c1', 'c2'])
>>> df.withColumn('test', fu(df.c1, df.c2)).show()
@diogobaltazar
diogobaltazar / py.py
Last active December 13, 2019 14:40
PySpark SQL comparison
# DATA ########################################################
''' test table
['str_col', 'int_col']
('abc', 1)
('def1', 2)
('def1', 3)
('def1', 3)
+-------+-------+
|str_col|int_col|
+-------+-------+
@diogobaltazar
diogobaltazar / mod1.py
Last active May 26, 2019 17:34
modules
if __name__ == '__main__':
print('mod1 is executing by having been called directly')
else:
print('mod1 is executing by having been imported by module ' + __name__)
def f():
if __name__ == '__main__':
print('mod1.f is executing from module mod1')
else:
print('mod1.f is executing from module' + __name__)
@diogobaltazar
diogobaltazar / src.py
Last active June 13, 2019 23:18
Python decorators | passing functions as arguments
def f(func = None):
if func != None:
func()
print('executing f')
def g():
print('executing g')
return g

scala

$ to reference this

java

a = false ? 1 : 0;
@diogobaltazar
diogobaltazar / spark.py
Last active October 20, 2022 16:35
pyspark | spark.sql, SparkSession | dataframes
# Row, Column, DataFrame, value are different concepts, and operating over DataFrames requires
# understanding these differences well.
#
# withColumn + UDF | must receive Column objects in the udf
# select + UDF | udf behaves as a mapping
from pyspark.sql import SparkSession