Friday, July 28, 2006
Data Segregation
Enterprise applications usually use some type of framework to support their underlying applications. Framework can vary from a simple reusable components such as a Math library to a more complex solution such as a mechanims to handle Messaging, Security, etc. Some of the more complex framework requires access to external data storages such as database or a queue, etc.
There are two options on how applications code deploy with this framework's external resouces. One solution is to share the framework's resources amongts all applications, thus data specific to those applications will be stored in the same storages/tables.
The other solution is to store data specific to the application into its own database domain, thus creating a physical boundaries between different applications.
There are pros and cons of both approaches which I will discuss as follows;
The benefits of using the same physical storage for all data are:
- Easy to implement
- Development/Support teams will only have to look at one place to find the data.
- Easy to maintain from the sense where update to data schema will only need to be applied in limited places.
The disadvantages of such solution are:
- Data from different applications are tighly located in the same place.
- No way to protect the integrity of the data of other applications, since different team could potentially affect other team/application data.
- Depends on certain situation/application behavior, there could be performance impact since all applications will be using the same data storage.
The benefit of using segregated data storage:
- No need to worry about affecting other application data, when doing updates/modification to the tables
- If security is a concern, using database to enforce sensitive data between applications is easier to implement.
- No data violation could happen, in a case where some record could have the same values but different meanings, but if the data are stored in the same table, there could be some constraint violations.
The disadvantages of this approach:
- More work for support team, when they have to maintain all applications, thus tools need to be created to leverage this problem.
- Upgrade of data schema need to be applied to more places, but depends on how you implement the schema changes this could work for the benefit of the applications that couldn't afford to have the update yet.
There are two options on how applications code deploy with this framework's external resouces. One solution is to share the framework's resources amongts all applications, thus data specific to those applications will be stored in the same storages/tables.
The other solution is to store data specific to the application into its own database domain, thus creating a physical boundaries between different applications.
There are pros and cons of both approaches which I will discuss as follows;
The benefits of using the same physical storage for all data are:
- Easy to implement
- Development/Support teams will only have to look at one place to find the data.
- Easy to maintain from the sense where update to data schema will only need to be applied in limited places.
The disadvantages of such solution are:
- Data from different applications are tighly located in the same place.
- No way to protect the integrity of the data of other applications, since different team could potentially affect other team/application data.
- Depends on certain situation/application behavior, there could be performance impact since all applications will be using the same data storage.
The benefit of using segregated data storage:
- No need to worry about affecting other application data, when doing updates/modification to the tables
- If security is a concern, using database to enforce sensitive data between applications is easier to implement.
- No data violation could happen, in a case where some record could have the same values but different meanings, but if the data are stored in the same table, there could be some constraint violations.
The disadvantages of this approach:
- More work for support team, when they have to maintain all applications, thus tools need to be created to leverage this problem.
- Upgrade of data schema need to be applied to more places, but depends on how you implement the schema changes this could work for the benefit of the applications that couldn't afford to have the update yet.
Friday, July 07, 2006
#region ... #endregion
#region my rant
Microsoft introduces a keyword #region ... #endregion that supposedly can be used to organize "like" codes into itsown region.
However, lately I have been encountering that it's also used as a somewhat of an excuse to have big classes. Somehow the use of #region makes it a little tolerable to have big classes.
I don't know if this is what Microsoft intends or not, but it's clearly not an appropriate solution to say the least.
The problem with big classes not only because it's not pretty to our eyesight, but there are some other basic problems with them:
1. Big classes are usually hard to test.
2. Big classes are usually error prone, because regardless of how we divide it up in regions, we still have to understand details of the class, and most humans usually comprehend things as smaller chunks better than one large chunk.
3. The use of region itself should have triggered the fact that the class might have low cohesiveness, thus they could be broken up to smaller classes.
The problems with region are:
1. It's another thing that we have to maintain to make sure things in the region are all "related", I have seen many times, that region starts out well but become "stink" over time.
2. Who's deciding what region to create, which code should go to which region, etc. This is left to each developer interpretation, should we have code review for the region ;-)
#endregion my rant
Microsoft introduces a keyword #region ... #endregion that supposedly can be used to organize "like" codes into itsown region.
However, lately I have been encountering that it's also used as a somewhat of an excuse to have big classes. Somehow the use of #region makes it a little tolerable to have big classes.
I don't know if this is what Microsoft intends or not, but it's clearly not an appropriate solution to say the least.
The problem with big classes not only because it's not pretty to our eyesight, but there are some other basic problems with them:
1. Big classes are usually hard to test.
2. Big classes are usually error prone, because regardless of how we divide it up in regions, we still have to understand details of the class, and most humans usually comprehend things as smaller chunks better than one large chunk.
3. The use of region itself should have triggered the fact that the class might have low cohesiveness, thus they could be broken up to smaller classes.
The problems with region are:
1. It's another thing that we have to maintain to make sure things in the region are all "related", I have seen many times, that region starts out well but become "stink" over time.
2. Who's deciding what region to create, which code should go to which region, etc. This is left to each developer interpretation, should we have code review for the region ;-)
#endregion my rant