Home

Object-Oriented Programming for Data Modeling

Block 2: Programming and Database Skills

Topic 2.3 · 3 Objectives

2.3.1 Basic OOP for Data Modeling

Object-Oriented Programming (OOP) lets you bundle related data and behavior into classes. In data analysis workflows, classes model real-world entities such as records, responses, configurations, and exporters. Understanding how to define and use classes is essential for the exam.

Defining Classes with the class Keyword

A class is a blueprint for creating objects. You define a class using the class keyword, followed by a name (conventionally in PascalCase) and a colon:

class DataRecord: """Represents a single data record from a survey.""" def __init__(self, record_id, values): # Constructor: called automatically when an object is created self.record_id = record_id # instance variable self.values = values # instance variable (a dict) self.is_valid = True # default attribute def field_count(self): """Return the number of fields in this record.""" return len(self.values) def get_field(self, name, default=None): """Safely retrieve a field value.""" return self.values.get(name, default) def invalidate(self, reason): """Mark the record as invalid.""" self.is_valid = False self.invalid_reason = reason # Creating instances rec = DataRecord(101, {"name": "Alice", "score": 87}) print(rec.record_id) # 101 print(rec.field_count()) # 2 print(rec.get_field("score")) # 87
Key Points for the Exam: __init__ is the constructor — it runs automatically when you create an object. The first parameter is always self, which refers to the instance being created. Every attribute assigned to self becomes an instance variable unique to that object.

Instance Methods

Instance methods are functions defined inside a class that operate on a specific object. They always take self as their first parameter, giving them access to the object's attributes and other methods:

class DataRecord: def __init__(self, record_id, values): self.record_id = record_id self.values = values def summary(self): """Return a summary string for this record.""" fields = ", ".join(self.values.keys()) return f"Record {self.record_id}: [{fields}]" def merge(self, other_values): """Merge additional fields into the record.""" self.values.update(other_values) rec = DataRecord(1, {"age": 30}) rec.merge({"city": "Amman"}) print(rec.summary()) # Record 1: [age, city]

Encapsulation: _protected and __private

Python uses naming conventions rather than strict access modifiers to signal how attributes should be used:

ConventionExampleMeaning
No prefixself.namePublic — part of the class's external API
Single underscoreself._cacheProtected — "internal, don't touch" (not enforced)
Double underscoreself.__secretPrivate — triggers name mangling (_ClassName__secret)
class SensitiveRecord: def __init__(self, patient_id, diagnosis): self.patient_id = patient_id # public self._diagnosis = diagnosis # protected (convention only) self.__ssn = None # private (name-mangled) def set_ssn(self, ssn): if len(ssn) == 9 and ssn.isdigit(): self.__ssn = ssn else: raise ValueError("Invalid SSN format") def get_masked_ssn(self): if self.__ssn: return "***-**-" + self.__ssn[-4:] return None rec = SensitiveRecord("P100", "Flu") rec.set_ssn("123456789") print(rec.get_masked_ssn()) # ***-**-6789 # Direct access to __ssn fails: # print(rec.__ssn) -> AttributeError # But name-mangled version still accessible (not truly private): print(rec._SensitiveRecord__ssn) # 123456789
Exam Watch: Name mangling with __ double underscores does NOT make an attribute truly private. Python rewrites self.__attr to self._ClassName__attr. This is designed to avoid accidental name collisions in inheritance, not for security.

Getters and Setters with @property

The @property decorator lets you expose computed or validated attributes that look like simple attribute access from the outside:

class TemperatureReading: def __init__(self, celsius): self._celsius = celsius # store internally @property def celsius(self): """Getter: return the temperature in Celsius.""" return self._celsius @celsius.setter def celsius(self, value): """Setter: validate before storing.""" if value < -273.15: raise ValueError("Temperature below absolute zero is impossible") self._celsius = value @property def fahrenheit(self): """Read-only computed property.""" return self._celsius * 9 / 5 + 32 t = TemperatureReading(25) print(t.celsius) # 25 (uses the getter) print(t.fahrenheit) # 77.0 (computed property) t.celsius = 30 # uses the setter with validation print(t.fahrenheit) # 86.0 # t.celsius = -300 -> ValueError # t.fahrenheit = 50 -> AttributeError (read-only)
Why Use @property? Properties let you add validation, computation, or logging to attribute access without changing the caller's code. Code that used obj.celsius before the property was added continues to work unchanged.

Managing Internal Object State

Objects maintain their own internal state, which methods can modify over time. This pattern is common in data pipelines where records pass through processing stages:

class PipelineRecord: def __init__(self, raw_data): self.raw_data = raw_data self._cleaned = False self._validated = False self._errors = [] def clean(self): # Strip whitespace from string values self.raw_data = { k: v.strip() if isinstance(v, str) else v for k, v in self.raw_data.items() } self._cleaned = True def validate(self, required_fields): for field in required_fields: if field not in self.raw_data: self._errors.append(f"Missing: {field}") self._validated = True @property def is_ready(self): return self._cleaned and self._validated and not self._errors rec = PipelineRecord({"name": " Alice ", "score": 92}) rec.clean() rec.validate(["name", "score"]) print(rec.is_ready) # True print(rec.raw_data) # {'name': 'Alice', 'score': 92}

2.3.2 OOP Patterns for Analysis Workflows

Real data projects often combine multiple classes. Three key OOP patterns appear in this course: composition, inheritance, and polymorphism.

Composition: Nesting Objects Inside Objects

Composition means one class contains instances of another class as attributes. This is a "has-a" relationship. For example, a Survey has multiple Question objects:

class Question: def __init__(self, text, question_type="text"): self.text = text self.question_type = question_type self.responses = [] def add_response(self, response): self.responses.append(response) def response_count(self): return len(self.responses) class Survey: def __init__(self, title): self.title = title self.questions = [] # will hold Question objects def add_question(self, text, question_type="text"): q = Question(text, question_type) self.questions.append(q) return q def total_responses(self): return sum(q.response_count() for q in self.questions) def summary(self): return { "title": self.title, "num_questions": len(self.questions), "total_responses": self.total_responses() } # Usage survey = Survey("Customer Satisfaction") q1 = survey.add_question("How would you rate our service?", "rating") q2 = survey.add_question("Any additional comments?", "text") q1.add_response(5) q1.add_response(4) q2.add_response("Great experience!") print(survey.summary()) # {'title': 'Customer Satisfaction', 'num_questions': 2, 'total_responses': 3}
Composition vs Inheritance: Use composition when one object contains another (Survey has Questions). Use inheritance when one object is a specialized version of another (CSVExporter is a DataExporter). The course exam tests both patterns.

Inheritance: Base Classes and Subclasses

Inheritance creates an "is-a" relationship. A subclass inherits all methods and attributes from its parent and can override or extend them:

class BaseExporter: """Base class for all data exporters.""" def __init__(self, data): self.data = data # list of dicts self._exported = False def validate(self): """Check that data is non-empty.""" if not self.data: raise ValueError("No data to export") return True def export(self): """Subclasses must override this method.""" raise NotImplementedError("Subclasses must implement export()") def log_export(self, format_name): self._exported = True print(f"Exported {len(self.data)} records as {format_name}") class CSVExporter(BaseExporter): """Exports data as CSV text.""" def export(self): self.validate() headers = ",".join(self.data[0].keys()) rows = [",".join(str(v) for v in row.values()) for row in self.data] result = headers + "\n" + "\n".join(rows) self.log_export("CSV") return result class JSONExporter(BaseExporter): """Exports data as a JSON string.""" def __init__(self, data, indent=2): super().__init__(data) # call parent constructor self.indent = indent def export(self): import json self.validate() result = json.dumps(self.data, indent=self.indent) self.log_export("JSON") return result # Usage data = [{"name": "Alice", "score": 87}, {"name": "Bob", "score": 92}] csv_out = CSVExporter(data) print(csv_out.export()) # Exported 2 records as CSV # name,score # Alice,87 # Bob,92 json_out = JSONExporter(data, indent=4) print(json_out.export()) # Exported 2 records as JSON # [ # {"name": "Alice", "score": 87}, # ... # ]

Method Overriding

When a subclass defines a method with the same name as one in the parent class, the subclass version overrides the parent. Use super() when you still need to call the parent's version:

class MarkdownExporter(BaseExporter): def validate(self): # Override: add extra validation, then call parent super().validate() for row in self.data: if not isinstance(row, dict): raise TypeError("Each row must be a dict") return True def export(self): self.validate() headers = list(self.data[0].keys()) lines = ["| " + " | ".join(headers) + " |"] lines.append("| " + " | ".join(["---"] * len(headers)) + " |") for row in self.data: lines.append("| " + " | ".join(str(v) for v in row.values()) + " |") self.log_export("Markdown") return "\n".join(lines)

Polymorphism: Same Interface, Different Behavior

Polymorphism means calling the same method name on different objects and getting behavior specific to each object's class. This is powerful in data pipelines where you process items uniformly without knowing their exact type:

def run_export_pipeline(exporters): """Process any list of exporters - polymorphism in action.""" results = {} for exporter in exporters: # Each exporter has .export() but implements it differently class_name = type(exporter).__name__ results[class_name] = exporter.export() return results data = [{"name": "Alice", "score": 87}, {"name": "Bob", "score": 92}] exporters = [ CSVExporter(data), JSONExporter(data), MarkdownExporter(data), ] # Polymorphic call: same .export() method, different output formats all_results = run_export_pipeline(exporters) for fmt, output in all_results.items(): print(f"\n--- {fmt} ---") print(output)
Polymorphism in Practice: In the example above, run_export_pipeline does not need to check the type of each exporter. It simply calls .export(), and each subclass provides its own implementation. This is the core idea behind polymorphism and a common exam topic.

Real-World Data Workflow Example

Combining composition, inheritance, and polymorphism in a single data pipeline:

class DataProcessor: """Base processor - defines the pipeline interface.""" def process(self, value): raise NotImplementedError class Trimmer(DataProcessor): def process(self, value): return value.strip() if isinstance(value, str) else value class UpperCaser(DataProcessor): def process(self, value): return value.upper() if isinstance(value, str) else value class Pipeline: """Composition: a Pipeline HAS processors.""" def __init__(self, processors): self.processors = processors # list of DataProcessor objects def run(self, value): for proc in self.processors: value = proc.process(value) # polymorphic call return value pipe = Pipeline([Trimmer(), UpperCaser()]) print(pipe.run(" hello world ")) # HELLO WORLD

2.3.3 Object Identity and Comparisons

Understanding the difference between object identity and equality is crucial for avoiding subtle bugs and is a frequent exam topic.

Reference Variables: Shared vs Independent Objects

When you assign an object to a new variable, you create a new reference to the same object in memory, not a copy:

class Bucket: def __init__(self, items): self.items = items # Both variables point to the SAME object a = Bucket([1, 2, 3]) b = a # b is an alias for a b.items.append(4) print(a.items) # [1, 2, 3, 4] - a is also affected! print(a is b) # True - same object in memory # To make an independent copy: import copy c = copy.copy(a) # shallow copy c.items.append(5) print(a.items) # [1, 2, 3, 4, 5] - still shared list! (shallow) d = copy.deepcopy(a) # deep copy - fully independent d.items.append(6) print(a.items) # [1, 2, 3, 4, 5] - a is NOT affected print(d.items) # [1, 2, 3, 4, 5, 6]
Aliasing Pitfall: When objects contain mutable attributes (lists, dicts), assigning one variable to another creates a shared reference. Modifying through one reference affects all aliases. This is one of the most common sources of bugs in data pipelines.

Mutation of Lists Inside Objects (Aliasing)

A particularly tricky case occurs when a mutable list is passed into an object's constructor:

shared_list = [1, 2, 3] class Container: def __init__(self, data): self.data = data # stores a REFERENCE, not a copy c1 = Container(shared_list) c2 = Container(shared_list) c1.data.append(99) print(c2.data) # [1, 2, 3, 99] - c2 is also affected! print(shared_list) # [1, 2, 3, 99] - original list too! # Fix: copy the list in the constructor class SafeContainer: def __init__(self, data): self.data = list(data) # make a copy! s1 = SafeContainer([10, 20]) s2 = SafeContainer([10, 20]) s1.data.append(30) print(s2.data) # [10, 20] - unaffected

Comparing with == vs is

Python provides two distinct comparison operators:

OperatorChecksQuestion It Answers
==Equality (value)"Do these have the same content?"
isIdentity (memory address)"Are these the exact same object?"
a = [1, 2, 3] b = [1, 2, 3] c = a print(a == b) # True - same values print(a is b) # False - different objects in memory print(a is c) # True - same object (c is an alias for a) # Special case: None should always be checked with 'is' x = None print(x is None) # True (preferred) print(x == None) # True (works but not Pythonic) # Small integers are cached by Python (implementation detail): a = 256 b = 256 print(a is b) # True (cached) a = 257 b = 257 print(a is b) # May be False (not guaranteed to be cached)
Exam Rule of Thumb: Use is only for None checks (if x is None) and when you specifically need identity. Use == for all value comparisons. Never rely on is for integer or string comparison in production code.

Custom Equality with __eq__()

By default, == on custom objects checks identity (same as is). You can override this by implementing the __eq__ dunder method:

class DataPoint: def __init__(self, x, y, label=""): self.x = x self.y = y self.label = label def __eq__(self, other): """Two DataPoints are equal if they have the same coordinates.""" if not isinstance(other, DataPoint): return NotImplemented return self.x == other.x and self.y == other.y def __repr__(self): return f"DataPoint(x={self.x}, y={self.y}, label='{self.label}')" def __str__(self): return f"({self.x}, {self.y})" p1 = DataPoint(3, 5, "origin") p2 = DataPoint(3, 5, "copy") p3 = DataPoint(1, 2) # __eq__ compares x and y only (not label) print(p1 == p2) # True (same coordinates) print(p1 == p3) # False (different coordinates) print(p1 is p2) # False (different objects in memory) # __repr__ vs __str__ print(repr(p1)) # DataPoint(x=3, y=5, label='origin') print(str(p1)) # (3, 5) print(p1) # (3, 5) - print() calls __str__ # __repr__ is used in lists and debugging print([p1, p3]) # [DataPoint(x=3, y=5, label='origin'), DataPoint(x=1, y=2, label='')]

__repr__ and __str__ Methods

MethodPurposeCalled By
__repr__Unambiguous, developer-facing representationrepr(), interactive shell, lists
__str__Readable, user-facing representationstr(), print()
__repr__ vs __str__: If only one is defined, implement __repr__. Python falls back to __repr__ when __str__ is not available, but not the other way around. The __repr__ output should ideally be valid Python that could recreate the object.

A complete example putting identity and comparison together:

class SurveyResponse: def __init__(self, respondent_id, answers): self.respondent_id = respondent_id self.answers = dict(answers) # defensive copy def __eq__(self, other): if not isinstance(other, SurveyResponse): return NotImplemented return self.respondent_id == other.respondent_id def __repr__(self): return f"SurveyResponse({self.respondent_id!r}, {self.answers!r})" def __str__(self): return f"Response from {self.respondent_id} ({len(self.answers)} answers)" r1 = SurveyResponse("U100", {"q1": 5, "q2": 3}) r2 = SurveyResponse("U100", {"q1": 4, "q2": 2}) print(r1 == r2) # True (same respondent_id) print(r1 is r2) # False (different objects) print(r1) # Response from U100 (2 answers) print(repr(r1)) # SurveyResponse('U100', {'q1': 5, 'q2': 3})

Practice Quiz: OOP for Data Modeling

Q1. What is the purpose of the __init__ method in a Python class?

A) It deletes the object from memory when it is no longer needed
B) It is the constructor that initializes instance attributes when an object is created
C) It is a static method that creates the class itself
D) It defines class-level variables shared by all instances
Explanation: __init__ is the constructor method. It is automatically called when a new instance is created and is used to set up the object's initial state through self.attribute = value assignments.

Q2. What happens when you access self.__secret from outside the class MyClass?

A) It returns None
B) It raises an AttributeError because of name mangling
C) It works perfectly since Python has no private attributes
D) It raises a PermissionError
Explanation: Double-underscore attributes trigger name mangling. Python renames __secret to _MyClass__secret internally, so accessing obj.__secret directly raises AttributeError. The attribute still exists as obj._MyClass__secret.

Q3. Which decorator turns a method into a property that can be accessed like an attribute?

A) @staticmethod
B) @classmethod
C) @property
D) @attribute
Explanation: The @property decorator defines a getter that allows a method to be accessed using attribute syntax (e.g., obj.name instead of obj.name()). You can also define a setter with @name.setter.

Q4. Consider the code below. What does it print?

class Box: def __init__(self, items): self.items = items a = Box([1, 2]) b = a b.items.append(3) print(a.items)
A) [1, 2]
B) [1, 2, 3]
C) [3]
D) An error is raised
Explanation: b = a does not create a copy; it creates an alias. Both a and b reference the same object in memory. When b.items.append(3) modifies the list, the change is visible through a as well.

Q5. Which OOP concept is demonstrated when a Survey class contains a list of Question objects?

A) Inheritance
B) Composition
C) Polymorphism
D) Encapsulation
Explanation: Composition is a "has-a" relationship where one class contains instances of another class as attributes. A Survey has Question objects. This is different from inheritance, which is an "is-a" relationship.

Q6. What is the output of the following code?

class Animal: def speak(self): return "..." class Dog(Animal): def speak(self): return "Woof" class Cat(Animal): def speak(self): return "Meow" animals = [Dog(), Cat(), Dog()] print([a.speak() for a in animals])
A) ["...", "...", "..."]
B) ["Woof", "Meow", "Woof"]
C) An error because Animal.speak() is overridden
D) ["Woof", "Woof", "Meow"]
Explanation: This is polymorphism. Each object in the list calls its own class's version of speak(). The Dog objects return "Woof" and the Cat object returns "Meow". The base class method is overridden by each subclass.

Q7. What is the difference between == and is in Python?

A) They are identical and can always be used interchangeably
B) == checks identity while is checks equality
C) == checks value equality while is checks if two references point to the same object
D) is only works with numbers and strings
Explanation: The == operator compares values (calling __eq__). The is operator compares identity — whether two variables reference the exact same object in memory. Use is primarily for None checks.

Q8. What does super().__init__(data) do inside a subclass constructor?

A) It creates a new parent class object
B) It calls the parent class's __init__ method to initialize inherited attributes
C) It copies all methods from the parent class
D) It makes the subclass a static class
Explanation: super() returns a proxy object for the parent class. Calling super().__init__(data) invokes the parent's constructor so that any attributes or setup logic defined in the parent class are properly initialized in the subclass instance.

Q9. If a class defines __repr__ but NOT __str__, what does print(obj) use?

A) It raises an AttributeError
B) It prints the memory address only
C) It falls back to __repr__
D) It prints an empty string
Explanation: When __str__ is not defined, Python falls back to __repr__ for print() and str() calls. The reverse is not true — repr() will never fall back to __str__. This is why __repr__ is the more important method to implement.

Q10. Consider the following code. What does p1 == p2 evaluate to?

class Point: def __init__(self, x, y): self.x = x self.y = y p1 = Point(3, 4) p2 = Point(3, 4) print(p1 == p2)
A) True — because they have the same x and y values
B) False — because __eq__ is not defined, so it defaults to identity comparison
C) An error is raised because __eq__ is not defined
D) True — because Python compares all attributes automatically
Explanation: Without a custom __eq__ method, Python's default == falls back to identity comparison (same as is). Since p1 and p2 are two different objects in memory, p1 == p2 returns False. You must implement __eq__ for value-based equality.

Navigation

2.3.1 Basic OOP for Data Modeling 2.3.2 OOP Patterns for Analysis Workflows 2.3.3 Object Identity and Comparisons Practice Quiz (10 Questions)