Python: charmap codec can’t encode character in position character maps to

I was trying to print a list of top 10,000 rows using following. I used PTVS – Python tools for Visual Studio.

with open(“myawesomefile’) as f:
reader = csv.reader(f)
your_list = list(reader)


I was getting the error.

when i did the following instead it worked. Most likely the command line doesnt want to print the 10000 rows.


Immutability of Data in Big Data Systems

Have you ever wondered why the Big Data systems, typically batch systems (Hadoop/Hbase, COSMOS store, Google BigTable) that allow Map-Reduce on them have immutability of the data store?

Fault-tolerance and resilience:

If the data is prone to human error, it is better to write data as is. Once you write data as is, you don’t have to worry about it ever being overwritten or lost. It is like a version, which continues to go on forever. But if you keep creating such a huge trail, you create a data explosion. It may not be as bad as you think and to get a coherent view on top of this data is simple.

The Memory Hierarchy

Thanks to Chris Terman and MIT open courseware. These notes are from an MIT lecture found here


0:20 What we want in a memory.
2:25 Technologies for memories. (see table)
5:30 SRAM Memory Cell
10:23 1-T Dynamic RAM
16:00 Hard disk
17:40 Challenge to cope with Quality vs. Quantity
18:15 Key idea: Best of both worlds using Memory hierarchy
20:55 Memory reference patterns. Locality for program, stack and data
24:20 Exploiting the Memory Hierarchy
31:53 The Cache Idea: Program-Transparent Memory Hierarchy
34: 16 How high of a Hit Ratio do we need?
36: 15 The Cache Principle
46: 16 Direct Mapped Cache
47: 36 Contention Problem: Contention, Death and Taxes

Professor talks about the detailed low level details of memory, addr, DIN/DOUT.

Two kinds of memories:

  1. 2-port main memory: One port for program counter and get back an instruction, the other port is to use load and store instructions, computing a memory address with an offset to get data.
  2. Register file: Built into the CPU data, two register operands for each instruction. Same organization as 2-port memory.

Technologies for memories:

Capacity Latency Cost
Register 100’s of bits 20ps $$$$
SRAM 100’s of Kbytes 1ns $$$
DRAM 1000’s of Mbytes 40ns $
Hard disk* 100’s of Gbytes 10ms Cents
Desired 1’s Gbytes 1ns cheap

The real bottleneck is if we have to fetch each instruction from the memory, there is a high order of latency even though the processor is very fast.

In past the speed of the processor has improved with CMOS technologies. The capacity of DRAM has increased, as the size of the transistors get smaller and smaller, but the latency in the DRAM which are dictated by the size of the memory, have not increased dramatically as compared to processor.


Static Ram – A technology that is used in our register file (one of the types of memory mentioned above). Professor talks about the low-level gates and transistors of SRAM that uses inverters. There is a static bi-stable storage element. The writes of bits “overpower” the reads.

We can build multi-port SRAMs. One can increase the number of SRAM ports by adding access transistors. By carefully sizing the inverter pair, so that one is strong and the other is weak, we can assure that our WRITE bus will only fight with the weaker one, and the READs are driven by the stronger one – thus minimizing both access and write times.

1-T Dynamic Ram

It is a high capacity memory system, is much simpler – involves six transistors/cell may not sound much, but they can add up quickly. What is the fewest number of transistors that can be used to store a bit? This is determined by area, better dielectric, thinner film, there is a formula to calculate that.

The interesting idea here is that every 10ms your computer is reading all the data in the memory and writing it back again so that it does not get lost.

A trick to increase throughput with the idea of pipelining. Send over the address in couple different chunks.

Synchronous DRAM (SDRAM)

Double-clocked Synchronous Memory (DDR)

The idea of DDR RAM is that it uses a clock transmission protocol. The reason the machine is slow because fetching the data from this memory system is slow.

Hard disk

  • Average latency = 4ms
  • Average seek time = 9ms
  • Transfer rate = 20Mbytes/sec
  • Capacity = 1TB
  • Cost <= $1/Gbytes
  • Spinning tracks: 7000 – 15000 RPM

There are cylinders with level of discs. Discs have tracks which are divided into sectors. The shaft and the read/write head is a mechanical device. Information is stored in concentric circles to minimize randomization of head.

Quantity vs Quality

  • Your memory can be BIG and slow …. or …
  • SMALL and FAST.

Is there an architectural solution to this DILEMMA.

We can nearly get our wish.

KEY: Use a hierarchy of memory technologies

Key Idea

  • Keep the most often-used data in a small, fast SRAM (often local to CPU chip)
  • Refer to Main Memory only rarely, for remaining data.
  • The reason this strategy works: LOCALITY

Statistically researchers have found a memory reference pattern. See diagram (21:03).

Program: Branching factor also affects the speed, usually if-else statements that branch program paths out.

Stack: At any given moment we are using a small amount of the stack in a program – called the activation records for the current subroutine.

Data: Copying data from one data structure to another or performing computation on it.

Exploiting the Memory Hierarchy

Approach1: (Cray, others): Expose Hierarchy

Hardware types: SMOP – As hardware guys get lazy they push the programmer to write smarter programs. Until recently these were the fastest machines on earth, Cray super computers. The argument of this type by Seymour Cray was that you cannot fake something that you do not have. And that is fake a huge faster memory.

    • Register, Main Memory

Disk each available as storage alternatives

    • Tell programmers: “Use them cleverly”

Approach2: Hide Hierarchy

Here the idea – the hardware looks over the shoulder, and manages of locality of reference. This is a layer abstraction that does a memory management.

    • Programming model: SINGLE kind of memory, single address space
    • Machine AUTOMATICALLY assigns location to fast or slow memory depending on usage patterns.

CPU looks at small static cache (usually L1/L2) and then the DRAM and then Hard disk. Most of what you buy in a processor is the cache memory. The size of the cache is important. Ideally you want most information to be found in the yellow box (small static cache).

The Cache Idea: Program-Transparent Memory Hierarchy

Cache contains “temporary copies” of selected main memory locations.

Challenge is to make hit ratio as high as possible.


  • Improve the average access time
    • HIT RATIO : Fraction of refs found in CACHE
    • MISS RATIO: Remaining references
  • Transparency (compatibility, programming ease)

How High of a Hit Ratio?

Suppose we can easily build an on-chip static memory with a 4ns access time, but the fastest DRAM that we can buy for main memory has an average access time of 40ns. How high of a hit rate do we need to sustain an average speed of 5ns? (Only slightly slower than cache?)

Over 97% of the time the instruction should be in the small cache. Over a period of time there is a subset of instructions that the processor can process and will need for computation. If the cache is big enough to accommodate that then we can achieve our hit ratio. The amount of time the CPU takes to process that should be balanced with amount of time to load the misses.

The Cache Principle

ALGORITHM: Look nearby for the requested information first, if it’s not there, check secondary storage.

Basic cache algorithm:

Cache knows two things, which addresses it has and the contents of them. CPU if there is a hit for the data in the cache, it can update the data. And then it is the cache’s responsibility to update it in the main memory. If there is a miss, then cache has to replace something from cache and replace it with something from the main memory.

Associativity: Parallel Lookup

Look at every row or line of the cache, and see if it has what CPU is looking for, all in parallel. Any data item can be located in any cache location. Fully –associative cache are very expensive and we need half of the register area to store address.

Direct-Mapped Cache

A cheaper alternative to associative cache. This is non-associative, where it indexes the data and look-up serially (as opposed to parallel). The basic idea is to use a table-index to find memory location quickly, because parallel operation of the same is expensive.

Problem: Contention and cache conflicts. Improve the mapping indexing function. (So use low-order as opposed to higher order of the address) – Since high-order do not change much given locality of references.

L1 cache: Are very small but very fast cache. They are a few thousand entries long, and they respond in 10ps.

Next lecture deals with the cache issues, and if there is a happy middle ground.

Fully Associative

  • Expensive
  • Flexible: any address can be cached in any line

Direct Mapped

  • Cheap (ordinary SRAM)
  • Contention: Addresses compete for cache lines.


Both MVP and MVVM are derivatives of MVC (see timelines and how these have evolved). The key difference between them is the dependency each layer has on other layers as well as how tightly bound they are to each other. See diagram and the references column for more details.

These patterns try to address mainly the problems of structuring the code that relate to 1. Application state, 2. Business Logic and 3. State and View synchronization.


Machine generated alternative text:
View I’ÇN 
I Controllerj
j Model



Machine generated alternative text:
14 InpUt

MVP is somewhere in the middle of MVC and MVVM. Also known as Presentation Model pattern

Machine generated alternative text:
View I• Input
Ivoirw Mod.I
Model I’




Explanation and flow
  • A user input like click of a link or a URL results in first interrupt by the controller.
  • A controller can output different views, based on authorization, error validation, success or custom logic, etc. See many-to-one relationship. Also note one-way communication from controller to the view.
  • Controller passes the model to the view, and view binds itself using a templating engine (Razor in case of ASP.NET MVC).
  • Model is usually a data-object POCO (Plain old CLR Object) with minimal to no methods (behavior).
  • A user input begins with the view and not presenter. View invokes commands on the presenter, and presenter in-turn modifies the View.
  • View and Model never communicate or know of (refer) each other.
  • Presenter is a layer of abstraction of the View.
  • There is always a one-to-one mapping between a presenter and the view.
  • Presentation Model and View talk to each other. View grabs properties and calls methods on the PM. PM exposes properties and methods for View and dispatches events, which the View may listen to.
  • PM talks to the Model in the domain layer either through a reference it contains or directly through indirect message.
  • A user input begins with the view and may end up in executing a ViewModel behavior.
  • View and Model never communicate or know of (refer) each other.
  • ViewModel is a strongly-typed model for the view that is an exact reflection (metaphorically speaking) or abstraction of the view.
  • ViewModel and View are always synced.
  • Model has no idea that View and ViewModel exists, and ViewModel has no idea that a View exists, which promotes for decoupling scenarios that pay off the dividend.

In C# a reference means that if a class uses the other.

In JavaScript, if a module or in case of View, if HTML contains a reference to the JavaScript module.

  • View refers to the model, but not vice-versa.
  • The controller refers the model, populates it and passes it to the View.
  • View is oblivious of the controller, but refers and expects a particular type of Model.
  • Presenter Model needs a reference to the View.
  • View also has reference to the Presenter which responds to the user events.
  • Presenter has a reference to the view and it populates the View, as opposed to View binding to the Model for every interaction.
  • To decouple, there usually is an abstract class or an interface that View and PM share.
  • Unlike the Presenter, a ViewModel does not need a reference to a view. View binds properties on a ViewModel.
  • The View has no idea that the model class exists.
  • The ViewModel and Model are unaware of the View.
  • Model is completely oblivious to the fact that ViewModel and View exists.

Views are often defined declaratively often using a tool or a designer (think HTML or XAML)

  • Views are reponsible to generate the markup, typically using a templating engine or a declarative language (HTML). The views may have conditional coding based on the Model property.
  • Either a different View is used for Edit and Read mode, or same view with conditional logic is used based on model property.
  • View has to expose an interface that can be used by the presenter.
  • Presenter implements this interface and provides the required methods defined in the interface.
  • View uses the interface exposed by the presenter in turn.
  • The view is declarative and contains the data-binding code that refers to the ViewModel.
  • There is a two-way bind and view is always synced with the ViewModel.

Examples you may use in views:

  • Formatting a display field (date string)
  • Showing only certain details depending on state. (only show edit if admin)
  • Managing view animations. (on hover, do something)
Controller or Presenter or


  • Controller or an area is reached through a routing engine which is a set of rules based on the input (URL) or API path in case of AJAX requests.
  • Controller decides which view has to be displayed, based on user input or current state of the user interaction with the application.
  • View sends the input through a url, which is interrupted by the routing engine to route to the appropriate controller.
  • Controller modifies and populates the Model and hands it over to the View.
  • There is typically an action method in a Controller for each user interaction and its variants.
  • The code-behind aspx.cs in represent the presenter – loosely speaking. The interface in this case will be a page class that is inherited by every aspx.cs file.
  • In the case of composition a Presentation Model may contain one or many child Presentation Model instances, but each child control will also have only one Presentation Model.
  • ViewModel does not need a reference to the View, which promotes loose-coupling and reuse of the same ViewModel for different views. Imagine, same viewModel used for website, mobile application and tablet application.
  • A ViewModel encapsulates the current state of the view as displayed on the screen as well as the various commands or behanviors based on events.
  • A ViewModel may act as an adapter which transforms the raw model data into something that is in the format to be displayed to the user.

Why do we need ViewModels

• Incorporating dropdown lists of lookup data into a related entity

• Master-detail records view

• Pagination: combining actual data and paging information

• Components like a shopping cart or user profile widget

• Dashboards, with multiple sources of disparate data

• Reports, often with aggregate data


  • The passive view implementation, in which the view contains no logic. The container is for UI controls that are directly manipulated by the presenter.


  • The supervising controller implementation, in which the view may be responsible for some elements of presentation logic, such as data binding, and has been given a reference to a data source from the domain models.

(This is closer to MVVM)


Models are often received from a service or through a dependency injection interface, which has more or less data presented in a format that caters to a larger consumer base than it maps to our UI needs.

  • Model object that you receive from the underlying services are raw and in the format that caters to different consumers of the service.
  • Not the entire model may be used by the view, but just the smaller subset of it.
  • Typically there is a need to collect different models from services into a single model.
  • Typically a domain layer object that contains, domain models, commands and subscription service.
  • Model is typically a server class transformed into a JSON or XML sent over the wire, or for server-side it may be a pre-defined domain class that is more general than what the view requires.
  • In case of undoable operations a ViewModel can refer to the model to restore the original state.
  • Perfect for web/HTTP, and accomodating of its stateless nature and addressability.
  • Disconnected stateless applications.
  • REST based thin clients as routing is inherent to this pattern.
  • Mobile applications implemented using HTML5.
  • Classic Webforms ASP.NET
  • SmartUI or Rapid App Development.
  • SharePoint webparts.
  • Windows Forms (WPF)
  • Migrating from legacy code, where UI logic is already wired up.
  • Heavy intranet work-flow based applications.
  • State heavy web applications or views.
  • Silverlight or Rich Internet Apps.
  • Windows phone or Android.
  • Highly event driven and stateful UI.
  • Two-way binding.
  • UI where a user interacts with app for a long time before saving the state.
  • Works well where connection between the view and rest of the program is not always available.
Patterns and Practices
  • Front controller

Think of Spring framework.

  • Controller is like the Strategy design pattern.
  • Page Controller

Think ASP.NET aspx pages with a complete Page lifecycle. (Init-Load-Validation-Event-Render-Unload)

  • Presenter acts as a mediator.
  • Observer or Publish/Subscribe (INotificationPropertyChanged, IObserver)
  • ViewModel exposes a Observable.
Framework/Library Client-side:

  • Backbone.js, knockback.js, Spine.js, angular.js.

  • Riot.js
  • GWT

  • Knockout.js
  • Kendo (MVVM)

  • Spring MVC
  • Ruby-on-Rails

  • Classic ASP.NET
  • JSP Servlets

  • WPF (Desktop) or Silverlight
  • Windows Phone apps (XAML)
  • Adobe Flex
  • Routing is inherent to this pattern and Controller acts as a mediator of presentation (View) and data (Model).
  • Routing gives the greater control of the application structure and makes it manageable.
  • The abstractions are properly separated, which enables more control over each layer, especially the view which now is clearly separated from the state.
  • Separation helps with testability.
  • The goal of MVP is to separate out the state and behavior out of the View, which makes it easier for legacy sphagetti applications to be migrate to MVP as a first step.
  • Since Presenter model always is written against an interface, it provides a GUI agnostic testable interface.
  • Imposes a consistent interface pattern that developers can follow.
  • Attempts to clearly separate the declarative UI with the business logic.
  • Promote parallel development, where UI developers write the binding and the model and viewModel are owned by application developers.
  • Clearly separates the view logic and makes it dumber with least amount of logic.
  • In practice a website, mobile application and tablet application all need different views, but can share the same viewModel.
  • ViewModel is easier to unit test than event driven code, and leaves the issues of UI automation testing out of the way.
  • ViewModel can be re-used for different representations as it is highly decoupled from the View.
  • If the model data is coming from the backend, it typically needs some sort of transformation like converting an enum to string, or as complex as calculating number of days from different data property of the model. Slowly the view starts holding more and more logic.
  • Mechanisms like ViewBag/ViewData exist which are abused to substitute the actual need for model, when model size is not large.
  • In practice the Model from the back-end repository is not useable due to different property names or data structure format. A new abstraction of the Model is created and often a pain to map this new ViewModel to the Model and manage changes.
  • The design pattern seems to work against the constraints of the HTTP web, as it demands heavier bandwidth which is not free or unlimited.
  • The view is still tightly coupled compared to MVC.
  • Debugging the events fired from the UI is harder due to its intermingling with the View.
  • It is hard to stick to one of the variants of MVP in all cases, resulting in a mixed code-base.
  • Cannot always be done in parallel as the interfaces need to be defined and agreed upon first.
  • For simpler application it is an overkill.
  • As opposed to MVC, the declarative bindings in MVVM make it harder to debug.
  • Data-binding on simpler controls are more code than data itself.
  • Data-binding implementation keep a lot of in-memory book-keeping.
  • Does View development drive ViewModel or vice versa, makes it harder to communicate.
  • Sometimes criticized as markup and JS code (the data-bindings) are inter-mized. Data-binding un-managed can consume considerable memory.
  • John Gossman points out that generalizing Views for a larger application becomes more difficult.
  • ViewModel is a class that is not a POCO or POJO, but its still worth the effort.
  • Designed by Trygve Reenskaug in 1979 during his time working on Smalltalk-80 at Xerox-PARC. The definition has evolved heavily during following years.
  • Proposed by Mike Potel from Taligent Inc in 1996. It’s a subsidiary of IBM.
  • Defined by John Gossman at Microsoft in 2005 for use with Windows Presentation Foundation.

References and Further Readings